imputegap.recovery.manager package¶
Module contents¶
- class imputegap.recovery.manager.TimeSeries(verbose=True)[source]¶
Bases:
objectClass for managing and manipulating time series data.
This class allows importing, normalizing, and visualizing time series datasets. It also provides methods to contaminate the datasets with missing values and plot results.
Methods¶
- __init__() :
Initializes the TimeSeries object.
- import_matrix(data=None) :
Imports a matrix of time series data.
- load_series(data, nbr_series=None, nbr_val=None, header=False, normalizer=”z_score”, replace_nan=False, reverse=False, verbose=True):
Loads time series data from a file or predefined dataset.
- print(limit=10, view_by_series=False) :
Prints a limited number of time series from the dataset.
- print_results(metrics, algorithm=””) :
Prints the results of the imputation process.
- normalize(normalizer=”z_score”, data=None, verbose=True):
Normalizes the time series dataset.
- plot(input_data, incomp_data=None, recov_data=None, max_series=None, max_values=None, size=(16, 8), save_path=””, display=True) :
Plots the time series data, including raw, contaminated, or imputed data.
- import_matrix(data=None)[source]¶
Imports a matrix of time series data.
The data can be provided as a list or a NumPy array. The format is (Series, Values), where series are separated by space, and values are separated by newline characters.
Parameters¶
- datalist or numpy.ndarray, optional
The matrix of time series data to import.
Returns¶
- TimeSeries
The TimeSeries object with the imported data.
- load_series(data, nbr_series=None, nbr_val=None, header=False, normalizer='z_score', replace_nan=False, reverse=False, verbose=True)[source]¶
Loads time series data from a file or predefined dataset.
The data is loaded as a matrix of shape (Values, Series). You can limit the number of series or values per series for computational efficiency.
Parameters¶
- datastr
The file path or name of a predefined dataset (e.g., ‘bafu.txt’).
- nbr_seriesint, optional
The maximum number of series to load.
- nbr_valint, optional
The maximum number of values per series.
- headerbool, optional
Whether the dataset has a header. Default is False.
- normalizerstr, optional
The normalization technique to use. Options are “z_score” or “min_max”. Default is “z_score”. To keep the raw data, set normalizer to None | normalizer=None
- replace_nanbool, optional
The Dataset has already NaN values that needs to be replaced by 0 values.
- reverse: bool, optional
Order of the 1st dimension of the dataset, series or values/timestamps. Default is False e.g. True : (50, 1000) / 50 sensors (lines) of 10000 values/timestamps (cols) e.g. False : (1000, 50) / 1000 values/timestamps (lines) for 50 sensors (cols)
- verbosebool, optional
Display information print (default: True).
Returns¶
- TimeSeries
The TimeSeries object with the loaded data.
Example¶
>>> ts.load_series(utils.search_path("eeg-alcohol"), nbr_series=50, nbr_val=100)
- normalize(normalizer='z_score', data=None, verbose=True)[source]¶
Normalize the time series dataset.
Supported normalization techniques are “z_score” and “min_max”. The method also logs the execution time for the normalization process.
Parameters¶
- normalizerstr, optional
The normalization technique to use. Options are “z_score” or “min_max”. Default is “z_score”.
- datadarray, optional
Matrix to normalize (outside of the object).
- verbosebool, optional
Whether to display the contamination information (default is False).
Returns¶
- numpy.ndarray
The normalized time series data.
Example¶
>>> ts.normalize(normalizer="z_score")
- plot(input_data, incomp_data=None, recov_data=None, nbr_series=None, nbr_val=None, series_range=None, subplot=False, size=(16, 8), algorithm=None, save_path='./imputegap_assets', style='default', cont_rate=None, grid=True, reverse=True, legends=True, display=True, verbose=True)[source]¶
Plot the time series data, including raw, contaminated, or imputed data.
Parameters¶
- input_datanumpy.ndarray
The original time series data without contamination.
- incomp_datanumpy.ndarray, optional
The contaminated time series data.
- recov_datanumpy.ndarray, optional
The imputed time series data.
- nbr_seriesint, optional
The maximum number of series to plot.
- nbr_valint, optional
The maximum number of values per series to plot.
- series_rangeint, optional
The index of a specific series to plot. If set, only this series will be plotted.
- subplotbool, optional
Print one time series by subplot or all in the same plot.
- sizetuple, optional
Size of the plot in inches. Default is (16, 8).
- algorithmstr, optional
Name of the algorithm used for imputation.
- save_pathstr, optional
Path to save the plot locally.
- stylestr, optional
Name of the style used for the plot (“default” / “mono”: specific series more visible).
- cont_ratestr, optional
Percentage of contamination in each series to plot.
- gridbool, optional
Whether to plot in a grid or not.
- reversebool, optional
Reverse the plot to see timestamps as x axis and values as y axis.
- legends: bool, optional
Display or not the legend in the plot (default is True).
- displaybool, optional
Whether to display the plot. Default is True.
- verbosebool, optional
Whether to display the plot information. Default is True.
Returns¶
- str or None
The file path of the saved plot, if applicable.
Example¶
>>> ts.plot(input_data=ts.data, nbr_series=9, nbr_val=100, save_path="./imputegap_assets") # plain data >>> ts.plot(ts.data, ts_m, nbr_series=9, subplot=True, save_path="./imputegap_assets") # contamination >>> ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True, save_path="./imputegap_assets") # imputation
- print(nbr_val=10, nbr_series=7, view_by_series=True)[source]¶
Prints a limited number of time series from the dataset.
Parameters¶
nbr_val : int, optional The number of timestamps to print. Default is 15. Use -1 for no restriction. nbr_series : int, optional The number of series to print. Default is 10. Use -1 for no restriction. view_by_series : bool, optional Whether to view by series (True) or by values (False).
Returns¶
None
- print_results(metrics, algorithm='', text='Results')[source]¶
Prints the results of the imputation process.
Parameters¶
- metricsdict
A dictionary containing the imputation metrics to display.
- algorithmstr, optional
The name of the algorithm used for imputation.
- algorithmstr, optional
Output text to help the user.
Returns¶
None
Example¶
>>> ts.print_results(imputer.metrics, imputer.algorithm)