imputegap.recovery.optimization package

Submodules

Module contents

class imputegap.recovery.optimization.BaseOptimizer[source]

Bases: object

A base class for optimization of imputation algorithm hyperparameters.

Provides structure and common functionality for different optimization strategies.

Methods

_objective(**kwargs):

Abstract method to evaluate the imputation algorithm with the provided parameters. Must be implemented by subclasses.

optimize(input_data, incomp_data, metrics, algorithm, **kwargs):

Abstract method for the main optimization process. Must be implemented by subclasses.

optimize(input_data, incomp_data, metrics, algorithm, **kwargs)[source]

Abstract method for optimization. Must be implemented in subclasses.

This method performs the optimization of hyperparameters for a given imputation algorithm. Each subclass implements a different optimization strategy (e.g., Greedy, Bayesian, Particle Swarm) and uses the _objective function to evaluate the parameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

incomp_datanumpy.ndarray

The contaminated time series dataset to impute.

metricslist of str

List of selected metrics for optimization.

algorithmstr

The imputation algorithm to optimize.

**kwargsdict

Additional parameters specific to the optimization strategy (e.g., number of iterations, particles, etc.).

Returns

tuple

A tuple containing the best parameters and their corresponding score.

class imputegap.recovery.optimization.Optimization[source]

Bases: object

A class for performing optimization of imputation algorithm hyperparameters.

This class contains methods for various optimization strategies such as Greedy, Bayesian, Particle Swarm, and Successive Halving, used to find the best parameters for different imputation algorithms.

Methods

Greedy.optimize(input_data, incomp_data, metrics=[“RMSE”], algorithm=”cdrec”, n_calls=250):

Perform greedy optimization for hyperparameters.

Bayesian.optimize(input_data, incomp_data, metrics=[“RMSE”], algorithm=”cdrec”, n_calls=100, n_random_starts=50, acq_func=’gp_hedge’):

Perform Bayesian optimization for hyperparameters.

ParticleSwarm.optimize(input_data, incomp_data, metrics, algorithm, n_particles, c1, c2, w, iterations, n_processes):

Perform Particle Swarm Optimization (PSO) for hyperparameters.

SuccessiveHalving.optimize(input_data, incomp_data, metrics, algorithm, num_configs, num_iterations, reduction_factor):

Perform Successive Halving optimization for hyperparameters.

class Bayesian[source]

Bases: BaseOptimizer

Bayesian optimization strategy for hyperparameters.

optimize(input_data, incomp_data, metrics=['RMSE'], algorithm='cdrec', n_calls=100, n_random_starts=50, acq_func='gp_hedge')[source]

Perform Bayesian optimization for hyperparameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

incomp_datanumpy.ndarray

The contaminated time series dataset to impute.

metricslist of str, optional

List of selected metrics for optimization (default is [“RMSE”]).

algorithmstr, optional

The imputation algorithm to optimize (default is ‘cdrec’).

n_callsint, optional

Number of calls to the objective function (default is 100).

n_random_startsint, optional

Number of random initial points (default is 50).

acq_funcstr, optional

Acquisition function for the Gaussian prior (default is ‘gp_hedge’).

Returns

tuple

A tuple containing the best parameters and their corresponding score.

class Greedy[source]

Bases: BaseOptimizer

Greedy optimization strategy for hyperparameters.

optimize(input_data, incomp_data, metrics=['RMSE'], algorithm='cdrec', n_calls=250)[source]

Perform greedy optimization for hyperparameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

incomp_datanumpy.ndarray

The contaminated time series dataset to impute.

metricslist of str, optional

List of selected metrics for optimization (default is [“RMSE”]).

algorithmstr, optional

The imputation algorithm to optimize (default is ‘cdrec’).

n_callsint, optional

Number of calls to the objective function (default is 250).

Returns

tuple

A tuple containing the best parameters and their corresponding score.

class ParticleSwarm[source]

Bases: BaseOptimizer

Particle Swarm Optimization (PSO) strategy for hyperparameters.

optimize(input_data, incomp_data, metrics, algorithm, n_particles, c1, c2, w, iterations, n_processes)[source]

Perform Particle Swarm Optimization for hyperparameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

incomp_datanumpy.ndarray

The contaminated time series dataset to impute.

metricslist of str, optional

List of selected metrics for optimization (default is [“RMSE”]).

algorithmstr, optional

The imputation algorithm to optimize (default is ‘cdrec’).

n_particlesint

Number of particles used in PSO.

c1float

PSO parameter, personal learning coefficient.

c2float

PSO parameter, global learning coefficient.

wfloat

PSO parameter, inertia weight.

iterationsint

Number of iterations for the optimization.

n_processesint

Number of processes during optimization.

Returns

tuple

A tuple containing the best parameters and their corresponding score.

class RayTune[source]

Bases: BaseOptimizer

RayTune optimization strategy for hyperparameters.

optimize(input_data, incomp_data, metrics=['RMSE'], algorithm='cdrec', n_calls=1, max_concurrent_trials=-1)[source]

Perform Ray Tune optimization for hyperparameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

metricslist of str, optional

List of selected metrics for optimization (default is [“RMSE”]).

algorithmstr, optional

The imputation algorithm to optimize (default is ‘cdrec’).

n_callsint, optional

Number of calls to the objective function (default is 10).

max_concurrent_trialsint, optional

Number of trials run in parallel, related to your total memory / cpu / gpu (default is 2). Please increase the value if you have more resources

Returns

tuple

A tuple containing the best parameters and their corresponding score.

class SuccessiveHalving[source]

Bases: BaseOptimizer

optimize(input_data, incomp_data, metrics, algorithm, num_configs, num_iterations, reduction_factor)[source]

Perform Successive Halving optimization for hyperparameters.

Parameters

input_datanumpy.ndarray

The ground truth time series dataset.

incomp_datanumpy.ndarray

The contaminated time series dataset to impute.

metricslist of str, optional

List of selected metrics for optimization (default is [“RMSE”]).

algorithmstr, optional

The imputation algorithm to optimize (default is ‘cdrec’).

num_configsint

Number of configurations to try.

num_iterationsint

Number of iterations for the optimization.

reduction_factorint

Reduction factor for the number of configurations kept after each iteration.

Returns

tuple

A tuple containing the best parameters and their corresponding score.