imputegap.recovery.manager package

Submodules

Module contents

class imputegap.recovery.manager.TimeSeries[source]

Bases: object

Class for managing and manipulating time series data.

This class allows importing, normalizing, and visualizing time series datasets. It also provides methods to contaminate the datasets with missing values and plot results.

Methods

__init__() :

Initializes the TimeSeries object.

import_matrix(data=None) :

Imports a matrix of time series data.

load_series(data=None, max_series=None, max_values=None, header=False) :

Loads time series data from a file or predefined dataset.

print(limit=10, view_by_series=False) :

Prints a limited number of time series from the dataset.

print_results(metrics, algorithm=””) :

Prints the results of the imputation process.

normalize(normalizer=”z_score”) :

Normalizes the time series dataset.

plot(input_data, incomp_data=None, recov_data=None, max_series=None, max_values=None, size=(16, 8), save_path=””, display=True) :

Plots the time series data, including raw, contaminated, or imputed data.

Contamination :

Class containing methods to contaminate time series data with missing values based on different patterns.

class Contamination[source]

Bases: object

Inner class to apply contamination patterns to the time series data.

Methods

mcar(ts, series_rate=0.2, missing_rate=0.2, block_size=10, offset=0.1, seed=True, explainer=False) :

Apply Missing Completely at Random (MCAR) contamination to the time series data.

missing_percentage(ts, series_rate=0.2, missing_rate=0.2, offset=0.1) :

Apply missing percentage contamination to the time series data.

missing_percentage_at_random(ts, series_rate=0.2, missing_rate=0.2, offset=0.1, seed=True) :

Apply missing percentage contamination at random to the time series data.

blackout(ts, missing_rate=0.2, offset=0.1) :

Apply blackout contamination to the time series data.

gaussian(input_data, series_rate=0.2, missing_rate=0.2, std_dev=0.2, offset=0.1, seed=True):

Apply Gaussian contamination to the time series data.

distribution(input_data, rate_dataset=0.2, rate_series=0.2, probabilities=None, offset=0.1, seed=True):

Apply any distribution contamination to the time series data based on their probabilities.

disjoint(input_data, missing_rate=0.1, limit=1, offset=0.1):

Apply Disjoint contamination to the time series data.

overlap(input_data, missing_rate=0.2, limit=1, shift=0.05, offset=0.1,):

Apply Overlapping contamination to the time series data.

blackout(series_rate=0.2, offset=0.1)[source]

Apply blackout contamination to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

series_ratefloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

Returns
numpy.ndarray

The contaminated time series data.

disjoint(rate_series=0.1, limit=1, offset=0.1)[source]

Apply disjoint contamination to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.1).

limitfloat, optional

Percentage expressing the limit index of the end of the contamination (default is 1: all length).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

Returns
numpy.ndarray

The contaminated time series data.

distribution(rate_dataset=0.2, rate_series=0.2, probabilities=None, offset=0.1, seed=True)[source]

Apply contamination with a probabilistic distribution to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

probabilities2-D array-like, optional

The probabilities of being contaminated associated with each values of a series. Most match the shape of input data without the offset : (e.g. [[0.1, 0, 0.3, 0], [0.2, 0.1, 0.2, 0.9]])

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

Returns
numpy.ndarray

The contaminated time series data.

gaussian(rate_dataset=0.2, rate_series=0.2, std_dev=0.2, offset=0.1, seed=True)[source]

Apply contamination with a Gaussian distribution to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

std_devfloat, optional

Standard deviation of the Gaussian distribution for missing values (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

Returns
numpy.ndarray

The contaminated time series data.

mcar(rate_dataset=0.2, rate_series=0.2, block_size=10, offset=0.1, seed=True, explainer=False)[source]

Apply Missing Completely at Random (MCAR) contamination to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

block_sizeint, optional

Size of the block of missing data (default is 10).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

explainerbool, optional

Whether to apply MCAR to specific series for explanation purposes (default is False).

Returns
numpy.ndarray

The contaminated time series data.

missing_percentage(rate_dataset=0.2, rate_series=0.2, offset=0.1)[source]

Apply missing percentage contamination to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

Returns
numpy.ndarray

The contaminated time series data.

missing_percentage_at_random(rate_dataset=0.2, rate_series=0.2, offset=0.1, seed=True)[source]

Apply missing percentage contamination with random starting position to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

Returns
numpy.ndarray

The contaminated time series data.

overlap(rate_series=0.2, limit=1, shift=0.05, offset=0.1)[source]

Apply overlap contamination to the time series data.

Parameters
input_datanumpy.ndarray

The time series dataset to contaminate.

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

limitfloat, optional

Percentage expressing the limit index of the end of the contamination (default is 1: all length).

shiftfloat, optional

Percentage of shift inside each the last disjoint contamination.

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

Returns
numpy.ndarray

The contaminated time series data.

import_matrix(data=None)[source]

Imports a matrix of time series data.

The data can be provided as a list or a NumPy array. The format is (Series, Values), where series are separated by space, and values are separated by newline characters.

Parameters

datalist or numpy.ndarray, optional

The matrix of time series data to import.

Returns

TimeSeries

The TimeSeries object with the imported data.

load_series(data, max_series=None, max_values=None, header=False, replace_nan=False)[source]

Loads time series data from a file or predefined dataset.

The data is loaded as a matrix of shape (Values, Series). You can limit the number of series or values per series for computational efficiency.

Parameters

datastr

The file path or name of a predefined dataset (e.g., ‘bafu.txt’).

max_seriesint, optional

The maximum number of series to load.

max_valuesint, optional

The maximum number of values per series.

headerbool, optional

Whether the dataset has a header. Default is False.

replace_nanbool, optional

The Dataset has already NaN values that needs to be replaced by 0 values.

Returns

TimeSeries

The TimeSeries object with the loaded data.

normalize(normalizer='z_score')[source]

Normalize the time series dataset.

Supported normalization techniques are “z_score” and “min_max”. The method also logs the execution time for the normalization process.

Parameters

normalizerstr, optional

The normalization technique to use. Options are “z_score” or “min_max”. Default is “z_score”.

Returns

numpy.ndarray

The normalized time series data.

plot(input_data, incomp_data=None, recov_data=None, max_series=None, max_values=None, series_range=None, subplot=False, size=(16, 8), save_path='./imputegap/assets', display=True)[source]

Plot the time series data, including raw, contaminated, or imputed data.

Parameters

input_datanumpy.ndarray

The original time series data without contamination.

incomp_datanumpy.ndarray, optional

The contaminated time series data.

recov_datanumpy.ndarray, optional

The imputed time series data.

max_seriesint, optional

The maximum number of series to plot.

max_valuesint, optional

The maximum number of values per series to plot.

series_rangeint, optional

The index of a specific series to plot. If set, only this series will be plotted.

subplotbool, optional

Print one time series by subplot or all in the same plot.

sizetuple, optional

Size of the plot in inches. Default is (16, 8).

save_pathstr, optional

Path to save the plot locally.

displaybool, optional

Whether to display the plot. Default is True.

Returns

str or None

The file path of the saved plot, if applicable.

print(limit_timestamps=10, limit_series=7, view_by_series=False)[source]

Prints a limited number of time series from the dataset.

Parameters

limit_timestamps : int, optional The number of timestamps to print. Default is 15. Use -1 for no restriction. limit_series : int, optional The number of series to print. Default is 10. Use -1 for no restriction. view_by_series : bool, optional Whether to view by series (True) or by values (False).

Returns

None

print_results(metrics, algorithm='', text='Imputation Results of')[source]

Prints the results of the imputation process.

Parameters

metricsdict

A dictionary containing the imputation metrics to display.

algorithmstr, optional

The name of the algorithm used for imputation.

algorithmstr, optional

Output text to help the user.

Returns

None