imputegap.recovery.explainer package

Submodules

Module contents

class imputegap.recovery.explainer.Explainer[source]

Bases: object

A class to manage SHAP-based model explanations and feature extraction for time series datasets.

Methods

load_configuration(file_path=None)

Load categories and features from a TOML file.

extractor_pycatch(data, features_categories, features_list, do_catch24=True)

Extract features from time series data using pycatch22.

extractor_tsfresh(data, categories=[“statistical”, “temporal”, “frequency”, “shape”])

Extract features from time series data using TSFresh. The function supports filtering by feature categories and calculates a large set of features (up to 783) related to statistics, temporal dynamics, frequency analysis, and shape.

extractor_tsfel(data, frequency=None, categories=[“spectral”, “statistical”, “temporal”, “fractal”])

Extract features from time series data using TSFEL (Time Series Feature Extraction Library). This method calculates features based on the selected categories and optionally uses the sampling frequency to compute frequency-domain features.

print(shap_values, shap_details=None)

Print SHAP values and details for display.

convert_results(tmp, file, algo, descriptions, features, categories, mean_features, to_save)

Convert SHAP raw results into a refined format for display.

execute_shap_model(x_dataset, x_information, y_dataset, file, algorithm, splitter=10, display=False, verbose=False)

Launch the SHAP model to explain the dataset features.

shap_explainer(input_data, algorithm=”cdrec”, params=None, extractor=”pycatch”, incomp_data=”mcar”,

missing_rate=0.4, block_size=10, offset=0.1, seed=True, limitation=15, splitter=0, file_name=”ts”, display=False, verbose=False)

Handle parameters and set variables to launch the SHAP model.

convert_results(file, algo, descriptions, features, categories, mean_features, to_save)[source]

Convert SHAP raw results to a refined format for display.

Parameters

tmplist

Current SHAP results.

filestr

Dataset used.

algostr

Algorithm used for imputation.

descriptionslist

Descriptions of each feature.

featureslist

Raw names of each feature.

categorieslist

Categories of each feature.

mean_featureslist

Mean values of each feature.

to_savestr

Path to save results.

Returns

list

A list of processed SHAP results.

execute_shap_model(x_information, y_dataset, file, algorithm, splitter=10, extractor='pycatch', display=False, verbose=False)[source]

Launch the SHAP model for explaining the features of the dataset.

Parameters

x_datasetnumpy.ndarray

Dataset of feature extraction with descriptions.

x_informationlist

Descriptions of all features grouped by categories.

y_datasetnumpy.ndarray

RMSE labels of each series.

filestr

Dataset used for SHAP analysis.

algorithmstr

Algorithm used for imputation (e.g., ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’).

splitterint, optional

Split ratio for data training and testing (default is 10).

extractorstr

Feature extractor used for the regression (e.g., ‘pycatch’, ‘tsfel’).

displaybool, optional

Whether to display the SHAP plots (default is False).

verbosebool, optional

Whether to print detailed output (default is False).

Returns

list

Results of the SHAP explainer model.

extractor_pycatch(features_categories, features_list, do_catch24=True)[source]

Extract features from time series data using pycatch22.

Parameters

datanumpy.ndarray

Time series dataset for feature extraction.

features_categoriesdict

Dictionary that maps feature names to categories.

features_listdict

Dictionary of all features expected.

do_catch24bool, optional

Flag to compute the mean and standard deviation for Catch24 (default is True).

Returns

tuple

A tuple containing: - results (dict): A dictionary of feature values by feature names. - descriptions (list): A list of tuples containing feature names, categories, and descriptions.

extractor_tsfel(frequency=None, categories=['spectral', 'statistical', 'temporal', 'fractal'])[source]

Extract features using TSFEL (Time Series Feature Extraction Library).

This function extracts features from the input time series data based on the specified categories. The categories determine the type of features to compute, such as spectral, statistical, temporal, or fractal features. Optionally, a frequency value can be provided to compute frequency-specific features.

Parameters

data (numpy.ndarray):

2D array of shape (M, N), where M is the number of time series and N is the number of values per time series. Each row represents a separate time series.

frequency (float, optional):

The sampling frequency of the time series data. This is used for spectral feature calculations (e.g., FFT-based features). If None, spectral features will be computed using default assumptions.

categories (list, optional):
A list of categories to extract. Valid categories are:
  • “spectral”: Extract frequency-domain features (e.g., FFT, spectral entropy).

  • “statistical”: Extract basic statistical features (e.g., mean, variance, skewness).

  • “temporal”: Extract temporal-domain features (e.g., autocorrelation, zero crossings).

  • “fractal”: Extract fractal-related features (e.g., Hurst exponent, fractal dimension).

By default, all four categories are extracted.

Returns

dict:

A dictionary where keys are feature names and values are the computed feature values for the entire dataset (aggregated over all time series).

list:
A list of tuples, where each tuple contains:
  • Feature name (str): The name of the feature.

  • Category (str): The category to which the feature belongs.

  • Formatted feature name (str): A human-readable version of the feature name.

Example

>>> import numpy as np
>>> data = np.random.rand(5, 100)  # 5 time series, each with 100 values
>>> results, descriptions = extractor_tsfel(data, frequency=50, categories=["statistical", "temporal"])
>>> print(results)
>>> print(descriptions)

Notes

  • This function requires TSFEL to be installed: pip install tsfel.

  • Categories can be customized to extract only the desired features, reducing computation time.

extractor_tsfresh(categories=['statistical', 'temporal', 'shape', 'frequency'])[source]

Extract features using tsfresh and group them into 4 categories: statistical, temporal, frequency, and shape-based.

Parameters

data (numpy.ndarray): 2D array of shape (M, N), where M is the number of series

and N is the number of values per series.

categories (list): List of categories to extract. Must include one or more of:

[“statistical”, “temporal”, “frequency”, “shape”]

Returns

dict: A dictionary with feature names as keys and their aggregated values as values. list: A list of tuples (feature_name, category, formatted_feature_name).

load_configuration()[source]

Load categories and features from a TOML file.

Parameters

file_pathstr, optional

The path to the TOML file (default is None). If None, it loads the default configuration file.

Returns

tuple

A tuple containing two dictionaries: categories, features and config.

print(shap_details=None)[source]

Convert SHAP raw results to a refined format for display.

Parameters

shap_valueslist

The SHAP values and results of the SHAP analysis.

shap_detailslist, optional

Input and output data of the regression, if available (default is None).

Returns

None

shap_explainer(algorithm='cdrec', params=None, extractor='pycatch', pattern='mcar', missing_rate=0.4, block_size=10, offset=0.1, seed=True, limit_ratio=1, split_ratio=0.6, file_name='ts', display=False, verbose=False)[source]

Handle parameters and set variables to launch the SHAP model.

Parameters

input_datanumpy.ndarray

The original time series dataset.

algorithmstr, optional

The algorithm used for imputation (default is ‘cdrec’). Valid values: ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’.

paramsdict, optional

Parameters for the algorithm.

patternstr, optional

Contamination pattern to apply (default is ‘mcar’).

extractorstr, optional

Extractor use to get the features of the data (default is ‘pycatch’). Valid values: ‘pycatch’, ‘tsfel’, ‘tsfresh’

missing_ratefloat, optional

Percentage of missing values per series (default is 0.4).

block_sizeint, optional

Size of the block to remove at each random position selected (default is 10).

offsetfloat, optional

Size of the uncontaminated section at the beginning of the time series (default is 0.1).

seedbool, optional

Whether to use a seed for reproducibility (default is True).

limit_ratioflaot, optional

Limitation on the number of series for the model (default is 1).

split_ratioflaot, optional

Limitation on the training series for the model (default is 0.6).

file_namestr, optional

Name of the dataset file (default is ‘ts’).

displaybool, optional

Whether to display the SHAP plots (default is False).

verbosebool, optional

Whether to print detailed output (default is False).

Returns

tuple

A tuple containing:

  • shap_valueslist

    SHAP values for each series.

  • shap_detailslist

    Detailed SHAP analysis results.

Notes

The contamination is applied to each time series using the specified method. The SHAP model is then used to generate explanations for the imputation results, which are logged in a local directory.