Algorithms¶
Any algorithm that can be called with a numpy array as parameter and a numpy array as return value can be evaluated.
TimeEval also supports passing only the filepath to an algorithm and let the algorithm perform the file reading and parsing.
In this case, the algorithm must be able to read the TimeEval canonical file format.
Use data_as_file=True as a keyword argument to the algorithm declaration.
The main function of an algorithm must implement the timeeval.adapters.base.Adapter-interface.
TimeEval comes with four different adapter types described in section Algorithm adapters.
Each algorithm is associated with metadata including its learning type and input dimensionality.
TimeEval distinguishes between the three learning types timeeval.TrainingType.UNSUPERVISED (default),
timeeval.TrainingType.SEMI_SUPERVISED, and timeeval.TrainingType.SUPERVISED
and the two input dimensionality definitions timeeval.InputDimensionality.UNIVARIATE (default) and
timeeval.InputDimensionality.MULTIVARIATE.
Registering algorithms¶
from timeeval import TimeEval, DatasetManager, Algorithm
from timeeval.adapters import FunctionAdapter
from timeeval.constants import HPI_CLUSTER
import numpy as np
def my_algorithm(data: np.ndarray) -> np.ndarray:
return np.zeros_like(data)
datasets = [("WebscopeS5", "A1Benchmark-1")]
algorithms = [
# Add algorithms to evaluate...
Algorithm(
name="MyAlgorithm",
main=FunctionAdapter(my_algorithm),
data_as_file=False,
)
]
timeeval = TimeEval(DatasetManager(HPI_CLUSTER.akita_dataset_paths[HPI_CLUSTER.BENCHMARK]), datasets, algorithms)
Algorithm adapters¶
Algorithm adapters allow you to use different algorithm types within TimeEval. The most basic adapter just wraps a python-function.
You can implement your own adapters. Example:
from typing import Optional
from timeeval.adapters.base import Adapter
from timeeval.data_types import AlgorithmParameter
class MyAdapter(Adapter):
# AlgorithmParameter = Union[np.ndarray, Path]
def _call(self, dataset: AlgorithmParameter, args: Optional[dict] = None) -> AlgorithmParameter:
# e.g. create another process or make a call to another language
pass
Function adapter¶
The timeeval.adapters.function.FunctionAdapter allows you to use Python functions and methods as the algorithm
main code.
You can use this adapter by wrapping your function:
from timeeval import Algorithm
from timeeval.adapters import FunctionAdapter
from timeeval.data_types import AlgorithmParameter
import numpy as np
def your_function(data: AlgorithmParameter, args: dict) -> np.ndarray:
if isinstance(data, np.ndarray):
return np.zeros_like(data)
else: # data = pathlib.Path
return np.genfromtxt(data)[0]
Algorithm(
name="MyPythonFunctionAlgorithm",
main=FunctionAdapter(your_function),
data_as_file=False
)
Docker adapter¶
The timeeval.adapters.docker.DockerAdapter allows you to run an algorithm as a Docker container.
This means that the algorithm is available as a Docker image.
This is the main adapter used for our evaluations.
Usage example:
from timeeval import Algorithm
from timeeval.adapters import DockerAdapter
Algorithm(
name="MyDockerAlgorithm",
main=DockerAdapter(image_name="algorithm-docker-image", tag="latest"),
data_as_file=True # important here!
)
Important
Using a DockerAdapter implies that data_as_file=True in the Algorithm construction.
The adapter supplies the dataset to the algorithm via bind-mounting and does not support passing the data as numpy array.
Experimental algorithm adapters¶
The algorithm adapters in this section are prototypical implementations and not fully tested with TimeEval. Some adapters were used in earlier versions of TimeEval and are not compatible to it anymore.
Warning
The following algorithm adapters should be used for educational purposes only. They are not fully tested with TimeEval!
Distributed adapter¶
The timeeval.adapters.distributed.DistributedAdapter allows you to execute an already distributed algorithm on multiple machines.
Supply a list of remote_hosts and a remote_command to this adapter.
It will use SSH to connect to the remote hosts and execute the remote_command on these hosts before starting the main algorithm locally.
Important
Password-less ssh to the remote machines required!
Do not combine with the distributed execution of TimeEval (“TimeEval.Distributed” using
TimeEval(..., distributed=True))! This will affect the timing results.
Jar adapter¶
The timeeval.adapters.jar.JarAdapter lets you evaluate Java algorithms in TimeEval.
You can supply the path to the Jar-File (executable) and any additional arguments to the Java-process call.
Adapter to apply univariate methods to multivariate data¶
The timeeval.adapters.multivar.MultivarAdapter allows you to apply an univariate algorithm to each dimension of a multivariate dataset individually
and receive a single aggregated result.
You can currently choose between three different result aggregation strategies that work on single points:
If n_jobs > 1, the algorithms are executed in parallel.
Algorithms provided with TimeEval¶
All algorithms that we provide with TimeEval use the DockerAdapter as adapter-implementation to allow you to use all features of TimeEval with them (such as resource restrictions, timeout, and fair runtime measurements).
You can find the TimeEval algorithm implementations on Github: https://github.com/TimeEval/TimeEval-algorithms and can pull the images directly from the GitHub container registry.
Using Docker images to bundle an algorithm for TimeEval also allows easy integration of new algorithms because there are no requirements regarding programming languages, frameworks, or tools.
However, using Docker images to bundle algorithms makes preparing them for use with TimeEval a bit more cumbersome (cf. How to integrate your own algorithm into TimeEval).
We use GitHub Actions to automatically build and publish the algorithm Docker images for direct use within TimeEval.
In this section, we describe some important aspects of this architecture.
TimeEval base Docker images¶
To benefit from Docker layer caching and to reduce code duplication (DRY!), we decided to put common functionality in so-called base images. The following is taken care of by base images:
Provide system (OS and common OS tools)
Provide language runtime (e.g. python3, java8)
Provide common libraries / algorithm dependencies
Define volumes for IO
Define Docker entrypoint script (performs initial container setup before the algorithm is executed)
Currently, we provide the following root base images:
Name/Folder |
Image |
Usage |
|---|---|---|
python2-base |
|
Base image for TimeEval methods that use python2 (version 2.7); includes default python packages. |
python3-base |
|
Base image for TimeEval methods that use python3 (version 3.7.9); includes default python packages. |
python36-base |
|
Base image for TimeEval methods that use python3.6 (version 3.6.13); includes default python packages. |
r4-base |
|
Base image for TimeEval methods that use R (version 4.0.5). |
java-base |
|
Base image for TimeEval methods that use Java (JRE 11.0.10). |
rust-base |
|
Base image for TimeEVal methods that use Rust (Rust 1.58). |
In addition to the root base images, we also provide some derived base images (intermediate images) that add further common functionality to the language runtimes:
Name/Folder |
Image |
Usage |
|---|---|---|
tsmp |
|
Base image for TimeEval methods that use the matrix profile R package |
pyod |
|
Base image for TimeEval methods that are based on the |
timeeval-test-algorithm |
|
Test image for TimeEval tests that use docker; is based on |
python3-torch |
|
Base image for TimeEval methods that use python3 (version 3.7.9) and PyTorch (version 1.7.1); includes default python packages and torch; is based on |
You can find all current base images in the timeeval-algorithms-repository under 0-base-images and 1-intermediate-images.
TimeEval algorithm interface¶
TimeEval uses a common interface to execute all the algorithms that implement the DockerAdapter.
This means that the algorithms’ input, output, and parameterization is equal for all provided algorithms.
Execution and parametrization¶
All algorithms are executed by creating a Docker container using their Docker image and then executing it. The base images take care of the container startup and they call the main algorithm file with a single positional parameter. This parameter contains a String-representation of the algorithm configuration as JSON. Example parameter JSON (2022-08-18):
{
"executionType": 'train' | 'execute',
"dataInput": string, # example: "path/to/dataset.csv",
"dataOutput": string, # example: "path/to/results.csv",
"modelInput": string, # example: "/path/to/model.pkl",
"modelOutput": string, # example: "/path/to/model.pkl",
"customParameters": dict
}
Custom algorithm parameters¶
All algorithm hyperparameters described in the corresponding algorithm paper are exposed via the customParameters configuration option.
This allows us to set those parameters from TimeEval.
Warning
TimeEval does not parse a manifest.json file to get the custom parameters’ types and default values.
We expect the users of TimeEval to be familiar with the algorithms, so that they can specify the required parameters manually.
However, we require each algorithm to be executable without specifying any custom parameters (especially for testing purposes).
Therefore, please provide sensible default parameters for all custom parameters within the method’s code.
If you want to contribute your algorithm implementation to TimeEval, please add a manifest.json-file to your algorithm anyway to aid the integration into other tools and for user information.
If your algorithm does not use the default parameters automatically and expects them to be provided, your algorithm will fail during runtime if no parameters are provided by the TimeEval user.
Input and output¶
Input and output for an algorithm is handled via bind-mounting files and folders into the Docker container.
All input data, such as the training dataset and the test dataset, are mounted read-only to the /data-folder of the container.
The configuration options dataInput and modelInput reflect this with the correct path to the dataset (e.g. { "dataInput": "/data/dataset.test.csv" }).
The dataset format follows our Canonical file format.
All output of your algorithm should be written to the /results-folder.
This is also reflected in the configuration options with the correct paths for dataOutput and modelOutput (e.g. { "dataOutput": "/results/anomaly_scores.csv" }).
The /results-folder is also bind-mounted to the algorithm container - but writable -, so that TimeEval can access the results after your algorithm finished.
An algorithm can also use this folder to write persistent log and debug information.
Every algorithm must produce an anomaly scoring as output and put it at the location specified with the dataOutput-key in the configuration.
The output file’s format is CSV-based with a single column and no header.
You can for example produce a correct anomaly scoring with NumPy’s numpy.savetxt-function: np.savetxt(<args.dataOutput>, arr, delimiter=",").
Temporary files and data of an algorithm are written to the current working directory (currently this is /app) or the temporary directory /tmp within the Docker container.
All files written to those folders is lost after the algorithm container is removed.
Example calls¶
The following Docker command represents the way how the TimeEval DockerAdapter executes your algorithm image:
docker run --rm \
-v <path/to/dataset.csv>:/data/dataset.csv:ro \
-v <path/to/results-folder>:/results:rw \
-e LOCAL_UID=<current user id> \
-e LOCAL_GID=<groupid of akita group> \
<resource restrictions> \
ghcr.io/timeeval/<your_algorithm>:latest execute-algorithm '{
"executionType": "execute",
"dataInput": "/data/dataset.csv",
"modelInput": "/results/model.pkl",
"dataOutput": "/results/anomaly_scores.ts",
"modelOutput": "/results/model.pkl",
"customParameters": {}
}'
This is translated to the following call within the container from the entry script of the base image:
docker run --rm \
-v <path/to/dataset.csv>:/data/dataset.csv:ro \
-v <path/to/results-folder>:/results:rw <...> \
ghcr.io/timeeval/<your_algorithm>:latest bash
# now, within the container
<python | java -jar | Rscript> $ALGORITHM_MAIN '{
"executionType": "execute",
"dataInput": "/data/dataset.csv",
"modelInput": "/results/model.pkl",
"dataOutput": "/results/anomaly_scores.ts",
"modelOutput": "/results/model.pkl",
"customParameters": {}
}'