imputegap.recovery.explainer package¶

Submodules¶

Module contents¶

class imputegap.recovery.explainer.Explainer[source]¶

Bases: object

A class to manage SHAP-based model explanations and feature extraction for time series datasets.

Methods¶

load_configuration(file_path=None): Load categories and features from a TOML file.
extractor_pycatch(data, features_categories, features_list, do_catch24=True): Extract features from time series data using pycatch22.
extractor_tsfresh(data, categories=[“statistical”, “temporal”, “frequency”, “shape”]): Extract features from time series data using TSFresh. The function supports filtering by feature categories and calculates a large set of features (up to 783) related to statistics, temporal dynamics, frequency analysis, and shape.
extractor_tsfel(data, frequency=None, categories=[“spectral”, “statistical”, “temporal”, “fractal”]): Extract features from time series data using TSFEL (Time Series Feature Extraction Library). This method calculates features based on the selected categories and optionally uses the sampling frequency to compute frequency-domain features.
print(shap_values, shap_details=None): Print SHAP values and details for display.
convert_results(tmp, file, algo, descriptions, features, categories, mean_features, to_save): Convert SHAP raw results into a refined format for display.
execute_shap_model(x_dataset, x_information, y_dataset, file, algorithm, splitter=10, display=False, verbose=False): Launch the SHAP model to explain the dataset features.
shap_explainer(input_data, algorithm=”cdrec”, params=None, extractor=”pycatch”, incomp_data=”mcar”,: missing_rate=0.4, block_size=10, offset=0.1, seed=True, limitation=15, splitter=0, file_name=”ts”, display=False, verbose=False)

Handle parameters and set variables to launch the SHAP model.

convert_results(file, algo, descriptions, features, categories, mean_features, to_save)[source]¶

Convert SHAP raw results to a refined format for display.

Parameters¶

tmplist: Current SHAP results.
filestr: Dataset used.
algostr: Algorithm used for imputation.
descriptionslist: Descriptions of each feature.
featureslist: Raw names of each feature.
categorieslist: Categories of each feature.
mean_featureslist: Mean values of each feature.
to_savestr: Path to save results.

Returns¶

list: A list of processed SHAP results.

execute_shap_model(x_information, y_dataset, file, algorithm, splitter=10, extractor='pycatch', display=False, verbose=False)[source]¶

Launch the SHAP model for explaining the features of the dataset.

Parameters¶

x_datasetnumpy.ndarray: Dataset of feature extraction with descriptions.
x_informationlist: Descriptions of all features grouped by categories.
y_datasetnumpy.ndarray: RMSE labels of each series.
filestr: Dataset used for SHAP analysis.
algorithmstr: Algorithm used for imputation (e.g., ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’).
splitterint, optional: Split ratio for data training and testing (default is 10).
extractorstr: Feature extractor used for the regression (e.g., ‘pycatch’, ‘tsfel’).
displaybool, optional: Whether to display the SHAP plots (default is False).
verbosebool, optional: Whether to print detailed output (default is False).

Returns¶

list: Results of the SHAP explainer model.

extractor_pycatch(features_categories, features_list, do_catch24=True)[source]¶

Extract features from time series data using pycatch22.

Parameters¶

datanumpy.ndarray: Time series dataset for feature extraction.
features_categoriesdict: Dictionary that maps feature names to categories.
features_listdict: Dictionary of all features expected.
do_catch24bool, optional: Flag to compute the mean and standard deviation for Catch24 (default is True).

Returns¶

tuple: A tuple containing: - results (dict): A dictionary of feature values by feature names. - descriptions (list): A list of tuples containing feature names, categories, and descriptions.

extractor_tsfel(frequency=None, categories=['spectral', 'statistical', 'temporal', 'fractal'])[source]¶

Extract features using TSFEL (Time Series Feature Extraction Library).

This function extracts features from the input time series data based on the specified categories. The categories determine the type of features to compute, such as spectral, statistical, temporal, or fractal features. Optionally, a frequency value can be provided to compute frequency-specific features.

Parameters¶

data (numpy.ndarray):
2D array of shape (M, N), where M is the number of time series and N is the number of values per time series. Each row represents a separate time series.

frequency (float, optional):
The sampling frequency of the time series data. This is used for spectral feature calculations (e.g., FFT-based features). If None, spectral features will be computed using default assumptions.

categories (list, optional):

A list of categories to extract. Valid categories are:

“spectral”: Extract frequency-domain features (e.g., FFT, spectral entropy).

“statistical”: Extract basic statistical features (e.g., mean, variance, skewness).

“temporal”: Extract temporal-domain features (e.g., autocorrelation, zero crossings).

“fractal”: Extract fractal-related features (e.g., Hurst exponent, fractal dimension).

By default, all four categories are extracted.

Returns¶

dict:
A dictionary where keys are feature names and values are the computed feature values for the entire dataset (aggregated over all time series).

list:

A list of tuples, where each tuple contains:

Feature name (str): The name of the feature.

Category (str): The category to which the feature belongs.

Formatted feature name (str): A human-readable version of the feature name.

Example

>>> import numpy as np
>>> data = np.random.rand(5, 100)  # 5 time series, each with 100 values
>>> results, descriptions = extractor_tsfel(data, frequency=50, categories=["statistical", "temporal"])
>>> print(results)
>>> print(descriptions)

Notes

This function requires TSFEL to be installed: pip install tsfel.
Categories can be customized to extract only the desired features, reducing computation time.

extractor_tsfresh(categories=['statistical', 'temporal', 'shape', 'frequency'])[source]¶

Extract features using tsfresh and group them into 4 categories: statistical, temporal, frequency, and shape-based.

Parameters¶

data (numpy.ndarray): 2D array of shape (M, N), where M is the number of series
and N is the number of values per series.

categories (list): List of categories to extract. Must include one or more of:
[“statistical”, “temporal”, “frequency”, “shape”]

Returns¶

dict: A dictionary with feature names as keys and their aggregated values as values. list: A list of tuples (feature_name, category, formatted_feature_name).

load_configuration()[source]¶

Load categories and features from a TOML file.

Parameters¶

file_pathstr, optional: The path to the TOML file (default is None). If None, it loads the default configuration file.

Returns¶

tuple: A tuple containing two dictionaries: categories, features and config.

print(shap_details=None)[source]¶

Convert SHAP raw results to a refined format for display.

Parameters¶

shap_valueslist: The SHAP values and results of the SHAP analysis.
shap_detailslist, optional: Input and output data of the regression, if available (default is None).

Returns¶

None

shap_explainer(algorithm='cdrec', params=None, extractor='pycatch', pattern='mcar', missing_rate=0.4, block_size=10, offset=0.1, seed=True, limit_ratio=1, split_ratio=0.6, file_name='ts', display=False, verbose=False)[source]¶

Handle parameters and set variables to launch the SHAP model.

Parameters¶

input_datanumpy.ndarray: The original time series dataset.
algorithmstr, optional: The algorithm used for imputation (default is ‘cdrec’). Valid values: ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’.
paramsdict, optional: Parameters for the algorithm.
patternstr, optional: Contamination pattern to apply (default is ‘mcar’).
extractorstr, optional: Extractor use to get the features of the data (default is ‘pycatch’). Valid values: ‘pycatch’, ‘tsfel’, ‘tsfresh’
missing_ratefloat, optional: Percentage of missing values per series (default is 0.4).
block_sizeint, optional: Size of the block to remove at each random position selected (default is 10).
offsetfloat, optional: Size of the uncontaminated section at the beginning of the time series (default is 0.1).
seedbool, optional: Whether to use a seed for reproducibility (default is True).
limit_ratioflaot, optional: Limitation on the number of series for the model (default is 1).
split_ratioflaot, optional: Limitation on the training series for the model (default is 0.6).
file_namestr, optional: Name of the dataset file (default is ‘ts’).
displaybool, optional: Whether to display the SHAP plots (default is False).
verbosebool, optional: Whether to print detailed output (default is False).

Returns¶

tuple

A tuple containing:

shap_valueslist
SHAP values for each series.
shap_detailslist
Detailed SHAP analysis results.

Notes¶

The contamination is applied to each time series using the specified method. The SHAP model is then used to generate explanations for the imputation results, which are logged in a local directory.