timeeval.datasets package¶
timeeval.datasets.analyzer¶
- class timeeval.datasets.analyzer.DatasetAnalyzer(dataset_id: Tuple[str, str], is_train: bool, df: Optional[DataFrame] = None, dataset_path: Optional[Path] = None, dmgr: Optional[Datasets] = None, ignore_stationarity: bool = False, ignore_trend: bool = False)¶
Utility class to analyze a dataset and infer metadata about the dataset.
Use this class to compute necessary metadata from a time series. The computation is started directly when instantiating this class. You can access the results using the property
metadata. There are multiple ways to instantiate this class, but you always have to specify the dataset ID, because it is part of the metadata:Use an existing pandas data frame object. Supply a value to the parameter df.
Use a path to a time series. Supply a value to the parameter dataset_path.
Use a dataset ID and a reference to the dataset manager. Supply a value to the parameter dmgr.
This class computes simple metadata, such as number of anomalies, mean, and standard deviation, as well as advanced metadata, such as trends or stationarity information for all time series channels. The simple metadata is exact. But the advanced metadata is estimated based on the observed time series data. The trend is computed by fitting linear regression models of different order to the time series. If the regression has a high enough correlation with the observed values, the trends and their confidence are recorded. The stationarity of the time series is estimated using two statistical tests, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) and the Augmented Dickey Fuller (ADF) test.
The metadata of a dataset can be stored to disk. This class provides utility functions to create a JSON-file per dataset, containing the metadata about the test time series and the optional training time series.
- Parameters
dataset_id (
tupleofstr,str) – ID of the dataset consisting of collection and dataset name.is_train (
bool) – If the analyzed time series is the testing or training time series of the dataset.df (
data frame, optional) – Time series data frame. If df is supplied, you can omit dataset_path and dmgr.dataset_path (
path, optional) – Path to the time series. If dataset_path is supplied, you can omit df and dmgr.dmgr (
Datasets object, optional) – Dataset manager instance that is used to load the time series if df and dataset_path are not specified.ignore_stationarity (
bool, optional) – Don’t estimate the time series’ channels stationarity. This might be necessary for large datasets, because this step takes a lot of time.ignore_trend (
bool, optional) – Don’t estimate the time series’ channels trend type. This might be necessary for large datasets, because this step takes a lot of time.
- static load_from_json(filename: Union[str, Path], train: bool = False) DatasetMetadata¶
Loads existing time series metadata from disk.
If there are multiple metadata entries with the same dataset ID and training/testing-label, the first entry is used.
- Parameters
filename (
path) – Path to the JSON-file containing the dataset metadata. Can be written usingtimeeval.datasets.analyzer.DatasetAnalyzer.save_to_json().train (
bool) – Whether the training or testing time series’ metadata should be loaded from the file.
- Returns
metadata – Metadata of the training or testing time series.
- Return type
time series metadata object
- property metadata: DatasetMetadata¶
Returns the computed metadata about the time series.
- save_to_json(filename: Union[str, Path], overwrite: bool = False) None¶
Save the computed metadata for a dataset to disk.
This method writes a dataset’s metadata to a JSON-formatted file to disk. The file contains a list of metadata specifications. One specification for the test time series and potentially another one for the train time series. Since the DatasetAnalyzer just analyzes a single time series at a time, this method appends the current metadata to the existing list per default. If you want to overwrite the existing content of the file, you can use the parameter overwrite.
- Parameters
filename (
path) – Path to the file, where the metadata should be written to. Might already exist.overwrite (
bool) – If existing data in the file should be overwritten or the current metadata should just be added to it.
timeeval.datasets.custom¶
- class timeeval.datasets.custom.CDEntry(test_path, train_path, details)¶
Bases:
NamedTuple
- class timeeval.datasets.custom.CustomDatasets(dataset_config: Union[str, Path])¶
Bases:
CustomDatasetsBaseImplementation of the custom datasets API.
Internal API! You should not need to use or modify this class.
This class behaves similar to the
timeeval.datasets.datasets.Datasets-API while using a different internal representation for the dataset index.- select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
timeeval.datasets.custom_base¶
- class timeeval.datasets.custom_base.CustomDatasetsBase¶
Bases:
ABCAPI definition for custom datasets.
Internal API! You should not need to use or modify this class.
- abstract select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
timeeval.datasets.custom_noop¶
- class timeeval.datasets.custom_noop.NoOpCustomDatasets¶
Bases:
CustomDatasetsBaseDummy implementation of the CustomDatasets interface.
Internal API! You should not need to use or modify this class.
This dummy implementation does nothing and improves readability of the
timeeval.datasets.datasets.Datasets-implementation by removing the need for None-checks.- select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
timeeval.datasets.dataset¶
- class timeeval.datasets.dataset.Dataset(datasetId: Tuple[str, str], dataset_type: str, training_type: TrainingType, length: int, dimensions: int, contamination: float, min_anomaly_length: int, median_anomaly_length: int, max_anomaly_length: int, period_size: Optional[int] = None, num_anomalies: Optional[int] = None)¶
Bases:
objectDataset information containing basic metadata about the dataset.
This class is used within TimeEval heuristics to determine the heuristic values based on the dataset properties.
- property input_dimensionality: InputDimensionality¶
- training_type: TrainingType¶
timeeval.datasets.dataset_manager¶
- class timeeval.datasets.dataset_manager.DatasetManager(data_folder: Union[str, Path], custom_datasets_file: Optional[Union[str, Path]] = None, create_if_missing: bool = True)¶
Bases:
ContextManager[DatasetManager],DatasetsManages benchmark datasets and their meta-information.
Manages dataset collections and their meta-information that are stored in a single folder with an index file. You can also use this class to create a new TimeEval dataset collection.
Warning
ATTENTION: Not multi-processing-safe! There is no check for changes to the underlying dataset.csv file while this class is loaded.
Read-only access is fine with multiple processes.
- Parameters
data_folder (
path) – Path to the folder, where the benchmark data is stored. This folder consists of the file datasets.csv and the datasets in a hierarchical storage layout.custom_datasets_file (
path) – Path to a file listing additional custom datasets.create_if_missing (
bool) – Create an index-file in thedata_folderif none could be found. Set this toFalseif an exception should be raised if the folder is wrong or does not exist.
- Raises
FileNotFoundError – If
create_if_missingis set toFalseand no datasets.csv-file was found in thedata_folder.
See also
timeeval.datasets.datasets.Datasets,timeeval.datasets.multi_dataset_manager.MultiDatasetManager- add_dataset(dataset: DatasetRecord) None¶
Adds a new dataset to the benchmark dataset collection (in-memory).
The provided dataset metadata is added to this dataset collection (to the in-memory index). You can save the in-memory index to disk using the
timeeval.datasets.DatasetManager.save()-method. The referenced time series files (training and testing paths) are not touched. If the same dataset ID (collection_name, dataset_name) than an existing dataset is specified, its entries are overwritten!- Parameters
dataset (
DatasetRecord object) – The dataset information to add to the benchmark collection.
- add_datasets(datasets: List[DatasetRecord]) None¶
Add a list of datasets to the dataset collection.
Add a list of new datasets to the benchmark dataset collection (in-memory). Already existing keys are overwritten!
- Parameters
datasets (
listofDatasetRecord objects) – List of dataset metdata to add to this dataset collection.
- df() DataFrame¶
Returns a copy of the internal dataset metadata collection.
The DataFrame has the following schema:
- Index:
dataset_name, collection_name
- Columns:
train_path, test_path, dataset_type, datetime_index, split_at, train_type, train_is_normal, input_type, length, dimensions, contamination, num_anomalies, min_anomaly_length, median_anomaly_length, max_anomaly_length, mean, stddev, trend, stationarity, period_size
- Returns
df – All custom and benchmark datasets and their metadata.
- Return type
data frame
- get(collection_name: Union[str, Tuple[str, str]], dataset_name: Optional[str] = None) Dataset¶
Returns dataset metadata.
Examples
>>> from timeeval.datasets import DatasetManager >>> dm = DatasetManager("path/to/datasets") >>> dataset_id = ("custom", "dataset1")
Access using the dataset ID:
>>> dm.get(dataset_id) Dataset(datsetId=("custom", "dataset1"), ...)
Access using collection and dataset name:
>>> dm.get("custom", "dataset1") Dataset(datsetId=("custom", "dataset1"), ...)
- get_collection_names() List[str]¶
Returns the unique dataset collection names (includes custom datasets if present).
- get_dataset_df(dataset_id: Tuple[str, str], train: bool = False) DataFrame¶
Loads the training/testing time series as a data frame.
- Parameters
- Returns
df – The training or testing time series as a
pandas.DataFrame.- Return type
data frame
- get_dataset_names() List[str]¶
Returns the unique dataset names (includes custom datasets if present).
- get_dataset_ndarray(dataset_id: Tuple[str, str], train: bool = False) ndarray¶
Loads the training/testing time series as an multi-dimensional array.
- get_dataset_path(dataset_id: Tuple[str, str], train: bool = False) Path¶
Returns the path to the training/testing time series of the dataset.
- get_detailed_metadata(dataset_id: Tuple[str, str], train: bool = False) DatasetMetadata¶
Computes detailed metadata about the training or testing time series of a dataset.
For most of the benchmark datasets, the detailed metadata is pre-computed and just has to be loaded from disk. For all other datasets, the time series is analyzed on the fly using
timeeval.datasets.DatasetAnalyzerand the result is saved back to disk for later reuse. The metadata about custom datasets is not cached on disk! The following additional metadata is provided:Information about the training time series, if
train=Trueis specified.Mean, variance, trend, and stationarity information for each channel of the time series individually.
- Parameters
- Returns
metadata – Detailed metadata about the training or testing time series.
- Return type
dataset metadata object
See also
timeeval.datasets.DatasetAnalyzerUtility class used for the extraction of metadata.
timeeval.datasets.DatasetMetadataData class of the returned result.
- get_training_type(dataset_id: Tuple[str, str]) TrainingType¶
Returns the training type of a specific dataset.
- Parameters
dataset_id (
tupleofstr,str) – Dataset ID (collection and dataset name)- Returns
training_type – Either unsupervised, semi-supervised, or supervised.
- Return type
TrainingType enum
See also
timeeval.TrainingTypeEnumeration of training types that could be returned by this method.
- load_custom_datasets(file_path: Union[str, Path]) None¶
Reads a configuration file that contains additional datasets and adds them to the current dataset index.
You can add custom datasets to the dataset manager either using a constructor argument or using this method. The datasets from the configuration file are added to the internal dataset index and are then available for querying. The configuration file uses the JSON schema and supports this structure:
{ "dataset_name": { "test_path": "./path/to/test.csv", "train_path": "./path/to/train.csv", "type": "synthetic", "period": 10 } }
The properties
train_path,type, andperiodare optional. Dataset names must be unqiue within the configuration file. The datasets are automatically assigned to thecustomdataset collection.Warning
Repeated calls to this method overwrite the existing custom dataset list.
- Parameters
file_path (
path) – Path to the custom dataset configuration file.
- save() None¶
Saves the in-memory dataset index to disk.
Persists newly added benchmark datasets from memory to the benchmark dataset collection file
datasets.csv. Custom datasets are excluded from persistence and cannot be saved to disk; useadd_dataset()oradd_datasets()to add datasets to the benchmark dataset collection.
- select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
Returns a list of dataset identifiers from the dataset collection whose datasets match all of the given conditions.
- Parameters
collection (
str) – restrict datasets to a specific collectiondataset (
str) – restrict datasets to a specific namedataset_type (
str) – restrict dataset type (e.g. real or synthetic)datetime_index (
bool) – only select datasets for which a datetime index exists; ifTrue: “timestamp”-column has datetime values; ifFalse: “timestamp”-column has monotonically increasing integer values; this condition is ignored by custom datasets.training_type (
timeeval.TrainingType) – select datasets for specific training needs: *SUPERVISED, *SEMI_SUPERVISED, or *UNSUPERVISEDtrain_is_normal (
bool) – ifTrue: only return datasets for which the training dataset does not contain anomalies; ifFalse: only return datasets for which the training dataset contains anomaliesinput_dimensionality (
timeeval.InputDimensionality) – restrict dataset to input type, either univariate or multivariatemin_anomalies (
int) – restrict datasets to those with a minimum number ofmin_anomaliesanomalous subsequencesmax_anomalies (
int) – restrict datasets to those with a maximum number ofmax_anomaliesanomalous subsequencesmax_contamination (
int) – restrict datasets to those having a contamination smaller or equal tomax_contamination
- Returns
dataset_names – A list of dataset identifiers (tuple of collection name and dataset name).
- Return type
List[Tuple[str,str]]
- class timeeval.datasets.dataset_manager.DatasetRecord(collection_name, dataset_name, train_path, test_path, dataset_type, datetime_index, split_at, train_type, train_is_normal, input_type, length, dimensions, contamination, num_anomalies, min_anomaly_length, median_anomaly_length, max_anomaly_length, mean, stddev, trend, stationarity, period_size)¶
Bases:
NamedTuple- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
timeeval.datasets.datasets¶
- class timeeval.datasets.datasets.Datasets(df: DataFrame, custom_datasets_file: Optional[Union[str, Path]] = None)¶
Bases:
ABCProvides read-only access to benchmark datasets and their metadata.
This is an abstract class (interface). Please use
timeeval.datasets.dataset_manager.DatasetManagerortimeeval.datasets.multi_dataset_manager.MultiDatasetManagerinstead. The constructor arguments are filled in by the respective implementation.- Parameters
df (
pandas DataFrame) – Metadata of all loaded datasets.custom_datasets_file (
pathlib.Pathorstr) – Path to a file listing additional custom datasets.
- df() DataFrame¶
Returns a copy of the internal dataset metadata collection.
The DataFrame has the following schema:
- Index:
dataset_name, collection_name
- Columns:
train_path, test_path, dataset_type, datetime_index, split_at, train_type, train_is_normal, input_type, length, dimensions, contamination, num_anomalies, min_anomaly_length, median_anomaly_length, max_anomaly_length, mean, stddev, trend, stationarity, period_size
- Returns
df – All custom and benchmark datasets and their metadata.
- Return type
data frame
- get(collection_name: Union[str, Tuple[str, str]], dataset_name: Optional[str] = None) Dataset¶
Returns dataset metadata.
Examples
>>> from timeeval.datasets import DatasetManager >>> dm = DatasetManager("path/to/datasets") >>> dataset_id = ("custom", "dataset1")
Access using the dataset ID:
>>> dm.get(dataset_id) Dataset(datsetId=("custom", "dataset1"), ...)
Access using collection and dataset name:
>>> dm.get("custom", "dataset1") Dataset(datsetId=("custom", "dataset1"), ...)
- get_collection_names() List[str]¶
Returns the unique dataset collection names (includes custom datasets if present).
- get_dataset_df(dataset_id: Tuple[str, str], train: bool = False) DataFrame¶
Loads the training/testing time series as a data frame.
- Parameters
- Returns
df – The training or testing time series as a
pandas.DataFrame.- Return type
data frame
- get_dataset_names() List[str]¶
Returns the unique dataset names (includes custom datasets if present).
- get_dataset_ndarray(dataset_id: Tuple[str, str], train: bool = False) ndarray¶
Loads the training/testing time series as an multi-dimensional array.
- get_dataset_path(dataset_id: Tuple[str, str], train: bool = False) Path¶
Returns the path to the training/testing time series of the dataset.
- get_detailed_metadata(dataset_id: Tuple[str, str], train: bool = False) DatasetMetadata¶
Computes detailed metadata about the training or testing time series of a dataset.
For most of the benchmark datasets, the detailed metadata is pre-computed and just has to be loaded from disk. For all other datasets, the time series is analyzed on the fly using
timeeval.datasets.DatasetAnalyzerand the result is saved back to disk for later reuse. The metadata about custom datasets is not cached on disk! The following additional metadata is provided:Information about the training time series, if
train=Trueis specified.Mean, variance, trend, and stationarity information for each channel of the time series individually.
- Parameters
- Returns
metadata – Detailed metadata about the training or testing time series.
- Return type
dataset metadata object
See also
timeeval.datasets.DatasetAnalyzerUtility class used for the extraction of metadata.
timeeval.datasets.DatasetMetadataData class of the returned result.
- get_training_type(dataset_id: Tuple[str, str]) TrainingType¶
Returns the training type of a specific dataset.
- Parameters
dataset_id (
tupleofstr,str) – Dataset ID (collection and dataset name)- Returns
training_type – Either unsupervised, semi-supervised, or supervised.
- Return type
TrainingType enum
See also
timeeval.TrainingTypeEnumeration of training types that could be returned by this method.
- load_custom_datasets(file_path: Union[str, Path]) None¶
Reads a configuration file that contains additional datasets and adds them to the current dataset index.
You can add custom datasets to the dataset manager either using a constructor argument or using this method. The datasets from the configuration file are added to the internal dataset index and are then available for querying. The configuration file uses the JSON schema and supports this structure:
{ "dataset_name": { "test_path": "./path/to/test.csv", "train_path": "./path/to/train.csv", "type": "synthetic", "period": 10 } }
The properties
train_path,type, andperiodare optional. Dataset names must be unqiue within the configuration file. The datasets are automatically assigned to thecustomdataset collection.Warning
Repeated calls to this method overwrite the existing custom dataset list.
- Parameters
file_path (
path) – Path to the custom dataset configuration file.
- abstract refresh(force: bool = False) None¶
Re-read the benchmark dataset collection information from disk.
- select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
Returns a list of dataset identifiers from the dataset collection whose datasets match all of the given conditions.
- Parameters
collection (
str) – restrict datasets to a specific collectiondataset (
str) – restrict datasets to a specific namedataset_type (
str) – restrict dataset type (e.g. real or synthetic)datetime_index (
bool) – only select datasets for which a datetime index exists; ifTrue: “timestamp”-column has datetime values; ifFalse: “timestamp”-column has monotonically increasing integer values; this condition is ignored by custom datasets.training_type (
timeeval.TrainingType) – select datasets for specific training needs: *SUPERVISED, *SEMI_SUPERVISED, or *UNSUPERVISEDtrain_is_normal (
bool) – ifTrue: only return datasets for which the training dataset does not contain anomalies; ifFalse: only return datasets for which the training dataset contains anomaliesinput_dimensionality (
timeeval.InputDimensionality) – restrict dataset to input type, either univariate or multivariatemin_anomalies (
int) – restrict datasets to those with a minimum number ofmin_anomaliesanomalous subsequencesmax_anomalies (
int) – restrict datasets to those with a maximum number ofmax_anomaliesanomalous subsequencesmax_contamination (
int) – restrict datasets to those having a contamination smaller or equal tomax_contamination
- Returns
dataset_names – A list of dataset identifiers (tuple of collection name and dataset name).
- Return type
List[Tuple[str,str]]
timeeval.datasets.metadata¶
- class timeeval.datasets.metadata.DatasetMetadata(dataset_id: Tuple[str, str], is_train: bool, length: int, dimensions: int, contamination: float, num_anomalies: int, anomaly_length: AnomalyLength, means: Dict[str, float], stddevs: Dict[str, float], trends: Dict[str, List[Trend]], stationarities: Dict[str, Stationarity])¶
Bases:
objectRepresents the metadata of a single time series of a dataset (for each channel).
- anomaly_length: AnomalyLength¶
- static from_json(s: str) DatasetMetadata¶
- stationarities: Dict[str, Stationarity]¶
- property stationarity: Stationarity¶
- class timeeval.datasets.metadata.DatasetMetadataEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶
Bases:
JSONEncoder- default(o: Any) Any¶
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- class timeeval.datasets.metadata.Stationarity(value)¶
Bases:
EnumAn enumeration.
- DIFFERENCE_STATIONARY = 1¶
- NOT_STATIONARY = 3¶
- STATIONARY = 0¶
- TREND_STATIONARY = 2¶
- static from_name(s: int) Stationarity¶
- class timeeval.datasets.metadata.Trend(tpe: timeeval.datasets.metadata.TrendType, coef: float, confidence_r2: float)¶
Bases:
object
timeeval.datasets.multi_dataset_manager¶
- class timeeval.datasets.multi_dataset_manager.MultiDatasetManager(data_folders: List[Union[str, Path]], custom_datasets_file: Optional[Union[str, Path]] = None)¶
Bases:
DatasetsProvides read-only access to multiple benchmark datasets collections and their meta-information.
Manages dataset collections and their meta-information that are stored in multiple folders. The entries in all index files must be unique and are NOT allowed to overlap! This would lead to information loss!
- Parameters
data_folders (
listofpaths) – List of data paths that hold the datasets and the index files.custom_datasets_file (
path) – Path to a file listing additional custom datasets.
- Raises
FileNotFoundError – If the datasets.csv-file was not found in any of the data_folders.
See also
timeeval.datasets.Datasets,timeeval.datasets.DatasetManager- df() DataFrame¶
Returns a copy of the internal dataset metadata collection.
The DataFrame has the following schema:
- Index:
dataset_name, collection_name
- Columns:
train_path, test_path, dataset_type, datetime_index, split_at, train_type, train_is_normal, input_type, length, dimensions, contamination, num_anomalies, min_anomaly_length, median_anomaly_length, max_anomaly_length, mean, stddev, trend, stationarity, period_size
- Returns
df – All custom and benchmark datasets and their metadata.
- Return type
data frame
- get(collection_name: Union[str, Tuple[str, str]], dataset_name: Optional[str] = None) Dataset¶
Returns dataset metadata.
Examples
>>> from timeeval.datasets import DatasetManager >>> dm = DatasetManager("path/to/datasets") >>> dataset_id = ("custom", "dataset1")
Access using the dataset ID:
>>> dm.get(dataset_id) Dataset(datsetId=("custom", "dataset1"), ...)
Access using collection and dataset name:
>>> dm.get("custom", "dataset1") Dataset(datsetId=("custom", "dataset1"), ...)
- get_collection_names() List[str]¶
Returns the unique dataset collection names (includes custom datasets if present).
- get_dataset_df(dataset_id: Tuple[str, str], train: bool = False) DataFrame¶
Loads the training/testing time series as a data frame.
- Parameters
- Returns
df – The training or testing time series as a
pandas.DataFrame.- Return type
data frame
- get_dataset_names() List[str]¶
Returns the unique dataset names (includes custom datasets if present).
- get_dataset_ndarray(dataset_id: Tuple[str, str], train: bool = False) ndarray¶
Loads the training/testing time series as an multi-dimensional array.
- get_dataset_path(dataset_id: Tuple[str, str], train: bool = False) Path¶
Returns the path to the training/testing time series of the dataset.
- get_detailed_metadata(dataset_id: Tuple[str, str], train: bool = False) DatasetMetadata¶
Computes detailed metadata about the training or testing time series of a dataset.
For most of the benchmark datasets, the detailed metadata is pre-computed and just has to be loaded from disk. For all other datasets, the time series is analyzed on the fly using
timeeval.datasets.DatasetAnalyzerand the result is saved back to disk for later reuse. The metadata about custom datasets is not cached on disk! The following additional metadata is provided:Information about the training time series, if
train=Trueis specified.Mean, variance, trend, and stationarity information for each channel of the time series individually.
- Parameters
- Returns
metadata – Detailed metadata about the training or testing time series.
- Return type
dataset metadata object
See also
timeeval.datasets.DatasetAnalyzerUtility class used for the extraction of metadata.
timeeval.datasets.DatasetMetadataData class of the returned result.
- get_training_type(dataset_id: Tuple[str, str]) TrainingType¶
Returns the training type of a specific dataset.
- Parameters
dataset_id (
tupleofstr,str) – Dataset ID (collection and dataset name)- Returns
training_type – Either unsupervised, semi-supervised, or supervised.
- Return type
TrainingType enum
See also
timeeval.TrainingTypeEnumeration of training types that could be returned by this method.
- load_custom_datasets(file_path: Union[str, Path]) None¶
Reads a configuration file that contains additional datasets and adds them to the current dataset index.
You can add custom datasets to the dataset manager either using a constructor argument or using this method. The datasets from the configuration file are added to the internal dataset index and are then available for querying. The configuration file uses the JSON schema and supports this structure:
{ "dataset_name": { "test_path": "./path/to/test.csv", "train_path": "./path/to/train.csv", "type": "synthetic", "period": 10 } }
The properties
train_path,type, andperiodare optional. Dataset names must be unqiue within the configuration file. The datasets are automatically assigned to thecustomdataset collection.Warning
Repeated calls to this method overwrite the existing custom dataset list.
- Parameters
file_path (
path) – Path to the custom dataset configuration file.
- select(collection: Optional[str] = None, dataset: Optional[str] = None, dataset_type: Optional[str] = None, datetime_index: Optional[bool] = None, training_type: Optional[TrainingType] = None, train_is_normal: Optional[bool] = None, input_dimensionality: Optional[InputDimensionality] = None, min_anomalies: Optional[int] = None, max_anomalies: Optional[int] = None, max_contamination: Optional[float] = None) List[Tuple[str, str]]¶
Returns a list of dataset identifiers from the dataset collection whose datasets match all of the given conditions.
- Parameters
collection (
str) – restrict datasets to a specific collectiondataset (
str) – restrict datasets to a specific namedataset_type (
str) – restrict dataset type (e.g. real or synthetic)datetime_index (
bool) – only select datasets for which a datetime index exists; ifTrue: “timestamp”-column has datetime values; ifFalse: “timestamp”-column has monotonically increasing integer values; this condition is ignored by custom datasets.training_type (
timeeval.TrainingType) – select datasets for specific training needs: *SUPERVISED, *SEMI_SUPERVISED, or *UNSUPERVISEDtrain_is_normal (
bool) – ifTrue: only return datasets for which the training dataset does not contain anomalies; ifFalse: only return datasets for which the training dataset contains anomaliesinput_dimensionality (
timeeval.InputDimensionality) – restrict dataset to input type, either univariate or multivariatemin_anomalies (
int) – restrict datasets to those with a minimum number ofmin_anomaliesanomalous subsequencesmax_anomalies (
int) – restrict datasets to those with a maximum number ofmax_anomaliesanomalous subsequencesmax_contamination (
int) – restrict datasets to those having a contamination smaller or equal tomax_contamination
- Returns
dataset_names – A list of dataset identifiers (tuple of collection name and dataset name).
- Return type
List[Tuple[str,str]]