========= Tutorials ========= .. _loading-preprocessing: Loading and Preprocessing ------------------------- ImputeGAP comes with several time series datasets. You can find them inside the submodule ``ts.datasets``. As an example, we start by using eeg-alcohol, a standard dataset composed of individuals with a genetic predisposition to alcoholism. The dataset contains measurements from 64 electrodes placed on subject’s scalps, sampled at 256 Hz (3.9-ms epoch) for 1 second. The dimensions of the dataset are 64 series, each containing 256 values. .. code-block:: python from imputegap.recovery.manager import TimeSeries from imputegap.tools import utils # initialize the TimeSeries() ts = TimeSeries() print(f"ImputeGAP datasets : {ts.datasets}") # load the timeseries from file or from the code ts.load_series(utils.search_path("eeg-alcohol")) ts.normalize(normalizer="z_score") # plot a subset of time series ts.plot(input_data=ts.data, nbr_series=9, nbr_val=100, save_path="./imputegap/assets") # print a subset of time series ts.print(nbr_series=6, nbr_val=20) .. _contamination: Contamination ------------- We now describe how to simulate missing values in the loaded dataset. ImputeGAP implements eight different missingness patterns. You can find them inside the module ``ts.patterns``. As example, we show how to contaminate the eeg-alcohol dataset with the MCAR pattern: .. code-block:: python from imputegap.recovery.manager import TimeSeries from imputegap.tools import utils # initialize the TimeSeries() object ts = TimeSeries() print(f"Missingness patterns : {ts.patterns}") # load and normalize the timeseries ts.load_series(utils.search_path("eeg-alcohol")) ts.normalize(normalizer="z_score") # contaminate the time series with MCAR pattern ts_m = ts.Contamination.missing_completely_at_random(ts.data, rate_dataset=0.2, rate_series=0.4, block_size=10, seed=True) # plot the contaminated time series ts.plot(ts.data, ts_m, nbr_series=9, subplot=True, save_path="./imputegap/assets") If you need to remove data following a specific distribution, please refer to this `tutorial `_. .. _imputation: Imputation ---------- In this section, we will illustrate how to impute the contaminated time series. Our library implements five families of imputation algorithms. Statistical, Machine Learning, Matrix Completion, Deep Learning, and Pattern Search Methods. You can find the list of algorithms inside the module ``ts.algorithms``. Imputation can be performed using either default values or user-defined values. To specify the parameters, please use a dictionary in the following format: .. code-block:: python params = {"param_1": 42.1, "param_2": "some_string", "params_3": True} Let's illustrate the imputation using the CDRec Algorithm from the Matrix Completion family. .. code-block:: python from imputegap.recovery.imputation import Imputation from imputegap.recovery.manager import TimeSeries from imputegap.tools import utils # initialize the TimeSeries() object ts = TimeSeries() print(f"Imputation algorithms : {ts.algorithms}") # load and normalize the timeseries ts.load_series(utils.search_path("eeg-alcohol")) ts.normalize(normalizer="z_score") # contaminate the time series ts_m = ts.Contamination.missing_completely_at_random(ts.data) # impute the contaminated series imputer = Imputation.MatrixCompletion.CDRec(ts_m) imputer.impute() # compute and print the imputation metrics imputer.score(ts.data, imputer.recov_data) ts.print_results(imputer.metrics) # plot the recovered time series ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True, save_path="./imputegap/assets") .. _parameterization: Parameterization ---------------- The Optimizer component manages algorithm configuration and hyperparameter tuning. To invoke the tuning process, users need to specify the optimization option during the Impute call by selecting the appropriate input for the algorithm. The parameters are defined by providing a dictionary containing the ground truth, the chosen optimizer, and the optimizer's options. Several search algorithms are available, including those provided by (`Ray Tune `_). .. code-block:: python from imputegap.recovery.imputation import Imputation from imputegap.recovery.manager import TimeSeries from imputegap.tools import utils # initialize the TimeSeries() object ts = TimeSeries() print(f"AutoML Optimizers : {ts.optimizers}") # load and normalize the timeseries ts.load_series(utils.search_path("eeg-alcohol")) ts.normalize(normalizer="z_score") # contaminate and impute the time series ts_m = ts.Contamination.missing_completely_at_random(ts.data) imputer = Imputation.MatrixCompletion.CDRec(ts_m) # use Ray Tune to fine tune the imputation algorithm imputer.impute(user_def=False, params={"input_data": ts.data, "optimizer": "ray_tune"}) # compute and print the imputation metrics imputer.score(ts.data, imputer.recov_data) ts.print_results(imputer.metrics) # plot the recovered time series ts.plot(input_data=ts.data, incomp_data=ts_m, recov_data=imputer.recov_data, nbr_series=9, subplot=True, save_path="./imputegap/assets", display=True) # save hyperparameters utils.save_optimization(optimal_params=imputer.parameters, algorithm=imputer.algorithm, dataset="eeg-alcohol", optimizer="ray_tune")