imputegap.recovery.contamination package

Module contents

class imputegap.recovery.contamination.GenGap(verbose=True)[source]

Bases: object

Class for contaminating times series data. This class is used to simulate missing values in the loaded dataset.

Methods

mcar(ts, series_rate=0.2, missing_rate=0.2, block_size=10, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True) :

Apply Missing Completely at Random (MCAR) contamination to selected series.

def aligned(input_data, rate_dataset=0.2, rate_series=0.2, offset=0.1, single_series=-1, logic_by_series=True, explainer=False, verbose=True):

Apply missing percentage contamination to selected series.

blackout(ts, missing_rate=0.2, offset=0.1, logic_by_series=True, verbose=True) :

Apply blackout contamination to selected series.

gaussian(input_data, series_rate=0.2, missing_rate=0.2, std_dev=0.2, offset=0.1, seed=True, logic_by_series=True, verbose=True):

Apply Gaussian contamination to selected series.

distribution(input_data, rate_dataset=0.2, rate_series=0.2, probabilities=None, offset=0.1, seed=True, logic_by_series=True, verbose=True):

Apply any distribution contamination to the time series data based on their probabilities.

disjoint(input_data, missing_rate=0.1, limit=1, offset=0.1, logic_by_series=True, verbose=True):

Apply Disjoint contamination to selected series.

overlap(input_data, missing_rate=0.2, limit=1, shift=0.05, offset=0.1, logic_by_series=True, verbose=True):

Apply Overlapping contamination to selected series.

References

aligned(rate_dataset=0.2, rate_series=0.2, offset=0.1, single_series=-1, logic_by_series=True, explainer=False, verbose=True)[source]

Missing blocks start and end at the same selected positions across the chosen series, resulting in aligned missing intervals.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Length of the initial uncontaminated segment of the series (default 0.1). If offset < 1, it is interpreted as a fraction of the total series length. If offset >= 1, it is interpreted as the exact number of initial values to keep uncontaminated.

single_series: int, optional

Target only 1 series on the dataset depending on the ID provided (default is -1, which means, not set).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

explainerbool, optional

Only used within the Explainer Module to contaminate one series at a time (default: False).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.aligned(ts.data, rate_dataset=0.2, rate_series=0.4, offset=0.1):
blackout(rate_series=0.2, offset=0.1, logic_by_series=True, verbose=True)[source]

Apply blackout contamination to selected series

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m =GenGap.blackout(ts.data, series_rate=0.2)
disjoint(rate_series=0.1, limit=1, offset=0.1, logic_by_series=True, verbose=True)[source]

Each missing block begins where the previous one ends, so the missing intervals are consecutive and do not overlap.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.1).

limitfloat, optional

Percentage expressing the limit index of the end of the contamination (default is 1: all length).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.disjoint(ts.data, rate_series=0.1, limit=1, offset=0.1)
distribution(rate_dataset=0.2, rate_series=0.2, probabilities_list=None, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]

Missingness follows a probability distribution, each position has a certain chance of being missing.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

probabilities_list2-D array-like, optional

The probabilities of being contaminated associated with each values of a series. Most match the shape of input data without the offset : (e.g. [[0.1, 0, 0.3, 0], [0.2, 0.1, 0.2, 0.9]])

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

explainerbool, optional

Only used within the Explainer Module to contaminate one series at a time (default: False).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.distribution(ts.data, rate_dataset=0.2, rate_series=0.2, probabilities_list=probabilities_list, offset=0.1)
gaussian(rate_dataset=0.2, rate_series=0.2, selected_mean='position', std_dev=0.2, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]

Missingness follows a probability distribution, each position has a certain chance of being missing.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

selected_mean: str, optional

Strategy to compute the mean value (default : “position”). Possibilities : “position”, “values”.

std_devfloat, optional

Standard deviation of the Gaussian distribution for missing values (default is 0.4).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

explainerbool, optional

Only used within the Explainer Module to contaminate one series at a time (default: False).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.gaussian(ts.data, rate_series=0.2, std_dev=0.4, offset=0.1):
mcar(rate_dataset=0.2, rate_series=0.2, block_size=10, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]

Missing blocks are introduced completely at random. Time series are selected at random, and blocks of a fixed size are removed at randomly chosen positions.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

block_sizeint, optional

Size of the block of missing data (default is 10).

offsetfloat, optional

Length of the initial uncontaminated segment of the series (default 0.1). If offset < 1, it is interpreted as a fraction of the total series length. If offset >= 1, it is interpreted as the exact number of initial values to keep uncontaminated.

seedbool, optional

Whether to use a seed for reproducibility (default is True).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

explainerbool, optional

Only used within the Explainer Module to contaminate one series at a time (default: False).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.mcar(ts.data, rate_dataset=0.2, rate_series=0.4, block_size=10):
overlap(rate_series=0.2, limit=1, shift=0.05, offset=0.1, logic_by_series=True, verbose=True)[source]

Each missing block starts at the end of the previous one with a specified shift, so the missing intervals are consecutive and overlap.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

limitfloat, optional

Percentage expressing the limit index of the end of the contamination (default is 1: all length).

shiftfloat, optional

Percentage of shift inside each the last disjoint contamination.

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.overlap(ts.data, rate_series=0.1, limit=1, shift=0.05, offset=0.1)
scattered(rate_dataset=0.2, rate_series=0.2, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]

The missing blocks all have the same size, but their starting positions are chosen at random.

Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html

Parameters

input_datanumpy.ndarray

The time series dataset to contaminate.

rate_datasetfloat, optional

Percentage of series to contaminate (default is 0.2).

rate_seriesfloat, optional

Percentage of missing values per series (default is 0.2).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

logic_by_seriesbool, optional

Contaminate the series based on the series (sensor) malfunction (default: True).

explainerbool, optional

Only used within the Explainer Module to contaminate one series at a time (default: False).

verbosebool, optional

Whether to display the contamination information (default is True).

Returns

numpy.ndarray

The contaminated time series data.

Example

>>> ts_m = GenGap.scattered(ts.data, rate_dataset=0.2, rate_series=0.4, offset=0.1)