imputegap.recovery.contamination package¶
Module contents¶
- class imputegap.recovery.contamination.GenGap(verbose=True)[source]¶
Bases:
objectClass for contaminating times series data. This class is used to simulate missing values in the loaded dataset.
Methods¶
- mcar(ts, series_rate=0.2, missing_rate=0.2, block_size=10, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True) :
Apply Missing Completely at Random (MCAR) contamination to selected series.
- def aligned(input_data, rate_dataset=0.2, rate_series=0.2, offset=0.1, single_series=-1, logic_by_series=True, explainer=False, verbose=True):
Apply missing percentage contamination to selected series.
- blackout(ts, missing_rate=0.2, offset=0.1, logic_by_series=True, verbose=True) :
Apply blackout contamination to selected series.
- gaussian(input_data, series_rate=0.2, missing_rate=0.2, std_dev=0.2, offset=0.1, seed=True, logic_by_series=True, verbose=True):
Apply Gaussian contamination to selected series.
- distribution(input_data, rate_dataset=0.2, rate_series=0.2, probabilities=None, offset=0.1, seed=True, logic_by_series=True, verbose=True):
Apply any distribution contamination to the time series data based on their probabilities.
- disjoint(input_data, missing_rate=0.1, limit=1, offset=0.1, logic_by_series=True, verbose=True):
Apply Disjoint contamination to selected series.
- overlap(input_data, missing_rate=0.2, limit=1, shift=0.05, offset=0.1, logic_by_series=True, verbose=True):
Apply Overlapping contamination to selected series.
References¶
- aligned(rate_dataset=0.2, rate_series=0.2, offset=0.1, single_series=-1, logic_by_series=True, explainer=False, verbose=True)[source]¶
Missing blocks start and end at the same selected positions across the chosen series, resulting in aligned missing intervals.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_datasetfloat, optional
Percentage of series to contaminate (default is 0.2).
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- offsetfloat, optional
Length of the initial uncontaminated segment of the series (default 0.1). If offset < 1, it is interpreted as a fraction of the total series length. If offset >= 1, it is interpreted as the exact number of initial values to keep uncontaminated.
- single_series: int, optional
Target only 1 series on the dataset depending on the ID provided (default is -1, which means, not set).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- explainerbool, optional
Only used within the Explainer Module to contaminate one series at a time (default: False).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.aligned(ts.data, rate_dataset=0.2, rate_series=0.4, offset=0.1):
- blackout(rate_series=0.2, offset=0.1, logic_by_series=True, verbose=True)[source]¶
Apply blackout contamination to selected series
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m =GenGap.blackout(ts.data, series_rate=0.2)
- disjoint(rate_series=0.1, limit=1, offset=0.1, logic_by_series=True, verbose=True)[source]¶
Each missing block begins where the previous one ends, so the missing intervals are consecutive and do not overlap.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.1).
- limitfloat, optional
Percentage expressing the limit index of the end of the contamination (default is 1: all length).
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.disjoint(ts.data, rate_series=0.1, limit=1, offset=0.1)
- distribution(rate_dataset=0.2, rate_series=0.2, probabilities_list=None, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]¶
Missingness follows a probability distribution, each position has a certain chance of being missing.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_datasetfloat, optional
Percentage of series to contaminate (default is 0.2).
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- probabilities_list2-D array-like, optional
The probabilities of being contaminated associated with each values of a series. Most match the shape of input data without the offset : (e.g. [[0.1, 0, 0.3, 0], [0.2, 0.1, 0.2, 0.9]])
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- seedbool, optional
Whether to use a seed for reproducibility (default is True).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- explainerbool, optional
Only used within the Explainer Module to contaminate one series at a time (default: False).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.distribution(ts.data, rate_dataset=0.2, rate_series=0.2, probabilities_list=probabilities_list, offset=0.1)
- gaussian(rate_dataset=0.2, rate_series=0.2, selected_mean='position', std_dev=0.2, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]¶
Missingness follows a probability distribution, each position has a certain chance of being missing.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_datasetfloat, optional
Percentage of series to contaminate (default is 0.2).
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- selected_mean: str, optional
Strategy to compute the mean value (default : “position”). Possibilities : “position”, “values”.
- std_devfloat, optional
Standard deviation of the Gaussian distribution for missing values (default is 0.4).
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- seedbool, optional
Whether to use a seed for reproducibility (default is True).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- explainerbool, optional
Only used within the Explainer Module to contaminate one series at a time (default: False).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.gaussian(ts.data, rate_series=0.2, std_dev=0.4, offset=0.1):
- mcar(rate_dataset=0.2, rate_series=0.2, block_size=10, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]¶
Missing blocks are introduced completely at random. Time series are selected at random, and blocks of a fixed size are removed at randomly chosen positions.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_datasetfloat, optional
Percentage of series to contaminate (default is 0.2).
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- block_sizeint, optional
Size of the block of missing data (default is 10).
- offsetfloat, optional
Length of the initial uncontaminated segment of the series (default 0.1). If offset < 1, it is interpreted as a fraction of the total series length. If offset >= 1, it is interpreted as the exact number of initial values to keep uncontaminated.
- seedbool, optional
Whether to use a seed for reproducibility (default is True).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- explainerbool, optional
Only used within the Explainer Module to contaminate one series at a time (default: False).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.mcar(ts.data, rate_dataset=0.2, rate_series=0.4, block_size=10):
- overlap(rate_series=0.2, limit=1, shift=0.05, offset=0.1, logic_by_series=True, verbose=True)[source]¶
Each missing block starts at the end of the previous one with a specified shift, so the missing intervals are consecutive and overlap.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- limitfloat, optional
Percentage expressing the limit index of the end of the contamination (default is 1: all length).
- shiftfloat, optional
Percentage of shift inside each the last disjoint contamination.
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.overlap(ts.data, rate_series=0.1, limit=1, shift=0.05, offset=0.1)
- scattered(rate_dataset=0.2, rate_series=0.2, offset=0.1, seed=True, logic_by_series=True, explainer=False, verbose=True)[source]¶
The missing blocks all have the same size, but their starting positions are chosen at random.
Docs: https://imputegap.readthedocs.io/en/latest/missingness_patterns.html
Parameters¶
- input_datanumpy.ndarray
The time series dataset to contaminate.
- rate_datasetfloat, optional
Percentage of series to contaminate (default is 0.2).
- rate_seriesfloat, optional
Percentage of missing values per series (default is 0.2).
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the series (default is 0.1).
- seedbool, optional
Whether to use a seed for reproducibility (default is True).
- logic_by_seriesbool, optional
Contaminate the series based on the series (sensor) malfunction (default: True).
- explainerbool, optional
Only used within the Explainer Module to contaminate one series at a time (default: False).
- verbosebool, optional
Whether to display the contamination information (default is True).
Returns¶
- numpy.ndarray
The contaminated time series data.
Example¶
>>> ts_m = GenGap.scattered(ts.data, rate_dataset=0.2, rate_series=0.4, offset=0.1)