imputegap.recovery.explainer package¶
Submodules¶
Module contents¶
- class imputegap.recovery.explainer.Explainer[source]¶
Bases:
object
A class to manage SHAP-based model explanations and feature extraction for time series datasets.
Methods¶
- load_configuration(file_path=None)
Load categories and features from a TOML file.
- extractor_pycatch(data, features_categories, features_list, do_catch24=True)
Extract features from time series data using pycatch22.
- extractor_tsfresh(data, categories=[“statistical”, “temporal”, “frequency”, “shape”])
Extract features from time series data using TSFresh. The function supports filtering by feature categories and calculates a large set of features (up to 783) related to statistics, temporal dynamics, frequency analysis, and shape.
- extractor_tsfel(data, frequency=None, categories=[“spectral”, “statistical”, “temporal”, “fractal”])
Extract features from time series data using TSFEL (Time Series Feature Extraction Library). This method calculates features based on the selected categories and optionally uses the sampling frequency to compute frequency-domain features.
- print(shap_values, shap_details=None)
Print SHAP values and details for display.
- convert_results(tmp, file, algo, descriptions, features, categories, mean_features, to_save)
Convert SHAP raw results into a refined format for display.
- execute_shap_model(x_dataset, x_information, y_dataset, file, algorithm, splitter=10, display=False, verbose=False)
Launch the SHAP model to explain the dataset features.
- shap_explainer(input_data, algorithm=”cdrec”, params=None, extractor=”pycatch”, incomp_data=”mcar”,
missing_rate=0.4, block_size=10, offset=0.1, seed=True, limitation=15, splitter=0, file_name=”ts”, display=False, verbose=False)
Handle parameters and set variables to launch the SHAP model.
- convert_results(file, algo, descriptions, features, categories, mean_features, to_save)[source]¶
Convert SHAP raw results to a refined format for display.
Parameters¶
- tmplist
Current SHAP results.
- filestr
Dataset used.
- algostr
Algorithm used for imputation.
- descriptionslist
Descriptions of each feature.
- featureslist
Raw names of each feature.
- categorieslist
Categories of each feature.
- mean_featureslist
Mean values of each feature.
- to_savestr
Path to save results.
Returns¶
- list
A list of processed SHAP results.
- execute_shap_model(x_information, y_dataset, file, algorithm, splitter=10, extractor='pycatch', display=False, verbose=False)[source]¶
Launch the SHAP model for explaining the features of the dataset.
Parameters¶
- x_datasetnumpy.ndarray
Dataset of feature extraction with descriptions.
- x_informationlist
Descriptions of all features grouped by categories.
- y_datasetnumpy.ndarray
RMSE labels of each series.
- filestr
Dataset used for SHAP analysis.
- algorithmstr
Algorithm used for imputation (e.g., ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’).
- splitterint, optional
Split ratio for data training and testing (default is 10).
- extractorstr
Feature extractor used for the regression (e.g., ‘pycatch’, ‘tsfel’).
- displaybool, optional
Whether to display the SHAP plots (default is False).
- verbosebool, optional
Whether to print detailed output (default is False).
Returns¶
- list
Results of the SHAP explainer model.
- extractor_pycatch(features_categories, features_list, do_catch24=True)[source]¶
Extract features from time series data using pycatch22.
Parameters¶
- datanumpy.ndarray
Time series dataset for feature extraction.
- features_categoriesdict
Dictionary that maps feature names to categories.
- features_listdict
Dictionary of all features expected.
- do_catch24bool, optional
Flag to compute the mean and standard deviation for Catch24 (default is True).
Returns¶
- tuple
A tuple containing: - results (dict): A dictionary of feature values by feature names. - descriptions (list): A list of tuples containing feature names, categories, and descriptions.
- extractor_tsfel(frequency=None, categories=['spectral', 'statistical', 'temporal', 'fractal'])[source]¶
Extract features using TSFEL (Time Series Feature Extraction Library).
This function extracts features from the input time series data based on the specified categories. The categories determine the type of features to compute, such as spectral, statistical, temporal, or fractal features. Optionally, a frequency value can be provided to compute frequency-specific features.
Parameters¶
- data (numpy.ndarray):
2D array of shape (M, N), where M is the number of time series and N is the number of values per time series. Each row represents a separate time series.
- frequency (float, optional):
The sampling frequency of the time series data. This is used for spectral feature calculations (e.g., FFT-based features). If None, spectral features will be computed using default assumptions.
- categories (list, optional):
- A list of categories to extract. Valid categories are:
“spectral”: Extract frequency-domain features (e.g., FFT, spectral entropy).
“statistical”: Extract basic statistical features (e.g., mean, variance, skewness).
“temporal”: Extract temporal-domain features (e.g., autocorrelation, zero crossings).
“fractal”: Extract fractal-related features (e.g., Hurst exponent, fractal dimension).
By default, all four categories are extracted.
Returns¶
- dict:
A dictionary where keys are feature names and values are the computed feature values for the entire dataset (aggregated over all time series).
- list:
- A list of tuples, where each tuple contains:
Feature name (str): The name of the feature.
Category (str): The category to which the feature belongs.
Formatted feature name (str): A human-readable version of the feature name.
Example
>>> import numpy as np >>> data = np.random.rand(5, 100) # 5 time series, each with 100 values >>> results, descriptions = extractor_tsfel(data, frequency=50, categories=["statistical", "temporal"]) >>> print(results) >>> print(descriptions)
Notes
This function requires TSFEL to be installed: pip install tsfel.
Categories can be customized to extract only the desired features, reducing computation time.
- extractor_tsfresh(categories=['statistical', 'temporal', 'shape', 'frequency'])[source]¶
Extract features using tsfresh and group them into 4 categories: statistical, temporal, frequency, and shape-based.
Parameters¶
- data (numpy.ndarray): 2D array of shape (M, N), where M is the number of series
and N is the number of values per series.
- categories (list): List of categories to extract. Must include one or more of:
[“statistical”, “temporal”, “frequency”, “shape”]
Returns¶
dict: A dictionary with feature names as keys and their aggregated values as values. list: A list of tuples (feature_name, category, formatted_feature_name).
- load_configuration()[source]¶
Load categories and features from a TOML file.
Parameters¶
- file_pathstr, optional
The path to the TOML file (default is None). If None, it loads the default configuration file.
Returns¶
- tuple
A tuple containing two dictionaries: categories, features and config.
- print(shap_details=None)[source]¶
Convert SHAP raw results to a refined format for display.
Parameters¶
- shap_valueslist
The SHAP values and results of the SHAP analysis.
- shap_detailslist, optional
Input and output data of the regression, if available (default is None).
Returns¶
None
- shap_explainer(algorithm='cdrec', params=None, extractor='pycatch', pattern='mcar', missing_rate=0.4, block_size=10, offset=0.1, seed=True, limit_ratio=1, split_ratio=0.6, file_name='ts', display=False, verbose=False)[source]¶
Handle parameters and set variables to launch the SHAP model.
Parameters¶
- input_datanumpy.ndarray
The original time series dataset.
- algorithmstr, optional
The algorithm used for imputation (default is ‘cdrec’). Valid values: ‘cdrec’, ‘stmvl’, ‘iim’, ‘mrnn’.
- paramsdict, optional
Parameters for the algorithm.
- patternstr, optional
Contamination pattern to apply (default is ‘mcar’).
- extractorstr, optional
Extractor use to get the features of the data (default is ‘pycatch’). Valid values: ‘pycatch’, ‘tsfel’, ‘tsfresh’
- missing_ratefloat, optional
Percentage of missing values per series (default is 0.4).
- block_sizeint, optional
Size of the block to remove at each random position selected (default is 10).
- offsetfloat, optional
Size of the uncontaminated section at the beginning of the time series (default is 0.1).
- seedbool, optional
Whether to use a seed for reproducibility (default is True).
- limit_ratioflaot, optional
Limitation on the number of series for the model (default is 1).
- split_ratioflaot, optional
Limitation on the training series for the model (default is 0.6).
- file_namestr, optional
Name of the dataset file (default is ‘ts’).
- displaybool, optional
Whether to display the SHAP plots (default is False).
- verbosebool, optional
Whether to print detailed output (default is False).
Returns¶
- tuple
A tuple containing:
- shap_valueslist
SHAP values for each series.
- shap_detailslist
Detailed SHAP analysis results.
Notes¶
The contamination is applied to each time series using the specified method. The SHAP model is then used to generate explanations for the imputation results, which are logged in a local directory.