imputegap.recovery.benchmark package

The imputegap.recovery.benchmark package provides various utility functions and tools for test the library

Submodules

Modules

class imputegap.recovery.benchmark.Benchmark[source]

Bases: object

A class to evaluate the performance of imputation algorithms through benchmarking across datasets and patterns.

Methods

average_runs_by_names(self, data):

Average the results of all runs depending on the dataset.

avg_results():

Calculate average metrics (e.g., RMSE) across multiple datasets and algorithm runs.

generate_heatmap():

Generate and save a heatmap visualization of RMSE scores for datasets and algorithms.

generate_reports_txt():

Create detailed text-based reports summarizing metrics and timing results for all evaluations.

generate_reports_excel():

Create detailed excel-based reports summarizing metrics and timing results for all evaluations.

generate_plots():

Visualize metrics (e.g., RMSE, MAE) and timing (e.g., imputation, optimization) across patterns and datasets.

eval():

Perform a complete benchmarking pipeline, including contamination, imputation, evaluation, and reporting.

Example

output : {‘drift’: {‘mcar’: {‘mean’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9234927128429051, ‘MAE’: 0.7219362152785619, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0010309219360351562, ‘optimization’: 0, ‘imputation’: 0.0005755424499511719}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.9699990038879407, ‘MAE’: 0.7774057495176013, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0020699501037597656, ‘optimization’: 0, ‘imputation’: 0.00048422813415527344}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.9914069853975623, ‘MAE’: 0.8134840739732964, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.007096290588378906, ‘optimization’: 0, ‘imputation’: 0.000461578369140625}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0552448338389784, ‘MAE’: 0.7426695186604741, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.043192148208618164, ‘optimization’: 0, ‘imputation’: 0.0005095005035400391}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0143105930114702, ‘MAE’: 0.7610548321723654, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.17184901237487793, ‘optimization’: 0, ‘imputation’: 0.0005536079406738281}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.010712060535523, ‘MAE’: 0.7641520748788702, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.6064670085906982, ‘optimization’: 0, ‘imputation’: 0.0005743503570556641}}}}, ‘cdrec’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.23303624184873978, ‘MAE’: 0.13619797235197734, ‘MI’: 1.2739817718416822, ‘CORRELATION’: 0.968435455112644}, ‘times’: {‘contamination’: 0.0009615421295166016, ‘optimization’: 0, ‘imputation’: 0.09218788146972656}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.18152059329152104, ‘MAE’: 0.09925566629402761, ‘MI’: 1.1516089897042538, ‘CORRELATION’: 0.9829398352220718}, ‘times’: {‘contamination’: 0.00482487678527832, ‘optimization’: 0, ‘imputation’: 0.09549617767333984}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.13894771223733138, ‘MAE’: 0.08459032692102293, ‘MI’: 1.186191167936035, ‘CORRELATION’: 0.9901338133811375}, ‘times’: {‘contamination’: 0.01713728904724121, ‘optimization’: 0, ‘imputation’: 0.1129295825958252}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.7544523683503829, ‘MAE’: 0.11218049973594252, ‘MI’: 0.021165172206064526, ‘CORRELATION’: 0.814120507570725}, ‘times’: {‘contamination’: 0.10881781578063965, ‘optimization’: 0, ‘imputation’: 1.9378046989440918}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.4355197572001326, ‘MAE’: 0.1380846624733049, ‘MI’: 0.10781252370591506, ‘CORRELATION’: 0.9166777087122915}, ‘times’: {‘contamination’: 0.2380077838897705, ‘optimization’: 0, ‘imputation’: 1.8785057067871094}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.7672558930795506, ‘MAE’: 0.32988968428439397, ‘MI’: 0.013509125598802707, ‘CORRELATION’: 0.7312998041323675}, ‘times’: {‘contamination’: 0.6805167198181152, ‘optimization’: 0, ‘imputation’: 1.9562773704528809}}}}, ‘stmvl’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.5434405584289141, ‘MAE’: 0.346560495723809, ‘MI’: 0.7328867182584357, ‘CORRELATION’: 0.8519431955571422}, ‘times’: {‘contamination’: 0.0022056102752685547, ‘optimization’: 0, ‘imputation’: 52.07010293006897}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.39007056542870916, ‘MAE’: 0.2753022759369617, ‘MI’: 0.8280959876205578, ‘CORRELATION’: 0.9180937736429735}, ‘times’: {‘contamination’: 0.002231597900390625, ‘optimization’: 0, ‘imputation’: 52.543020248413086}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.37254427425455994, ‘MAE’: 0.2730547993858495, ‘MI’: 0.7425412593844177, ‘CORRELATION’: 0.9293322959355041}, ‘times’: {‘contamination’: 0.0072672367095947266, ‘optimization’: 0, ‘imputation’: 52.88247036933899}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.6027573766269363, ‘MAE’: 0.34494332493982044, ‘MI’: 0.11876685901414151, ‘CORRELATION’: 0.8390532279447225}, ‘times’: {‘contamination’: 0.04321551322937012, ‘optimization’: 0, ‘imputation’: 54.10793352127075}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.9004526656857551, ‘MAE’: 0.4924048353228427, ‘MI’: 0.011590260996247858, ‘CORRELATION’: 0.5650541301828254}, ‘times’: {‘contamination’: 0.1728806495666504, ‘optimization’: 0, ‘imputation’: 40.53373336791992}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0112488396023014, ‘MAE’: 0.7646823531588104, ‘MI’: 0.00040669209664367576, ‘CORRELATION’: 0.0183962968474991}, ‘times’: {‘contamination’: 0.6077785491943359, ‘optimization’: 0, ‘imputation’: 35.151907444000244}}}}, ‘iim’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.4445625930776235, ‘MAE’: 0.2696133927362288, ‘MI’: 1.1167751522591498, ‘CORRELATION’: 0.8944975075266335}, ‘times’: {‘contamination’: 0.0010058879852294922, ‘optimization’: 0, ‘imputation’: 0.7380530834197998}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.2939506418814281, ‘MAE’: 0.16953644212278182, ‘MI’: 1.0160968166750064, ‘CORRELATION’: 0.9531900627237018}, ‘times’: {‘contamination’: 0.0019745826721191406, ‘optimization’: 0, ‘imputation’: 4.7826457023620605}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.2366529609250008, ‘MAE’: 0.14709529129218185, ‘MI’: 1.064299483512458, ‘CORRELATION’: 0.9711348247027318}, ‘times’: {‘contamination’: 0.00801849365234375, ‘optimization’: 0, ‘imputation’: 33.94813060760498}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.4155649406397416, ‘MAE’: 0.22056702659999994, ‘MI’: 0.06616526470761779, ‘CORRELATION’: 0.919934494058292}, ‘times’: {‘contamination’: 0.04391813278198242, ‘optimization’: 0, ‘imputation’: 255.31524085998535}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.38695094864012947, ‘MAE’: 0.24340565131372927, ‘MI’: 0.06361822797740405, ‘CORRELATION’: 0.9249744935121553}, ‘times’: {‘contamination’: 0.17044353485107422, ‘optimization’: 0, ‘imputation’: 840.7470128536224}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.5862696375344495, ‘MAE’: 0.3968159514130716, ‘MI’: 0.13422239939628303, ‘CORRELATION’: 0.8178796825899766}, ‘times’: {‘contamination’: 0.5999574661254883, ‘optimization’: 0, ‘imputation’: 1974.6101157665253}}}}, ‘mrnn’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9458508648057621, ‘MAE’: 0.7019459696903068, ‘MI’: 0.11924522547609226, ‘CORRELATION’: 0.02915935932568557}, ‘times’: {‘contamination’: 0.001056671142578125, ‘optimization’: 0, ‘imputation’: 49.42237901687622}}, ‘0.1’: {‘scores’: {‘RMSE’: 1.0125309431502871, ‘MAE’: 0.761136543268339, ‘MI’: 0.12567590499764303, ‘CORRELATION’: -0.037161060882302754}, ‘times’: {‘contamination’: 0.003415822982788086, ‘optimization’: 0, ‘imputation’: 49.04829454421997}}, ‘0.2’: {‘scores’: {‘RMSE’: 1.0317754516097355, ‘MAE’: 0.7952869439926, ‘MI’: 0.10908095436833125, ‘CORRELATION’: -0.04155403791391449}, ‘times’: {‘contamination’: 0.007429599761962891, ‘optimization’: 0, ‘imputation’: 49.42568325996399}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0807965786089415, ‘MAE’: 0.7326965517264863, ‘MI’: 0.006171770470542263, ‘CORRELATION’: -0.020630168509677818}, ‘times’: {‘contamination’: 0.042899370193481445, ‘optimization’: 0, ‘imputation’: 49.479795694351196}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0441472017887297, ‘MAE’: 0.7599852461729673, ‘MI’: 0.01121013333181846, ‘CORRELATION’: -0.007513931343350665}, ‘times’: {‘contamination’: 0.17329692840576172, ‘optimization’: 0, ‘imputation’: 50.439927101135254}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0379347892718205, ‘MAE’: 0.757440007226372, ‘MI’: 0.0035880775657246428, ‘CORRELATION’: -0.0014975078469404196}, ‘times’: {‘contamination’: 0.6166613101959229, ‘optimization’: 0, ‘imputation’: 50.66455388069153}}}}}}}

average_runs_by_names(data)[source]

Average the results of all runs depending on the dataset

Parameters

datalist

list of dictionary containing the results of the benchmark runs.

Returns

list

list of dictionary containing the results of the benchmark runs averaged by datasets.

avg_results(*datasets)[source]

Calculate the average of all metrics and times across multiple datasets.

Parameters

datasetsdict

Multiple dataset dictionaries to be averaged.

Returns

List

Matrix with averaged scores and times for all levels, list of algorithms, list of datasets

eval(algorithms=['cdrec'], datasets=['eeg-alcohol'], patterns=['mcar'], x_axis=[0.05, 0.1, 0.2, 0.4, 0.6, 0.8], optimizers=['user_def'], save_dir='./reports', runs=1)[source]

Execute a comprehensive evaluation of imputation algorithms over multiple datasets and patterns.

Parameters

algorithmslist of str

List of imputation algorithms to test.

datasetslist of str

List of dataset names to evaluate.

patternslist of str

List of contamination patterns to apply.

x_axislist of float

List of missing rates for contamination.

optimizerslist of dict

List of optimizers with their configurations.

save_dirstr, optional

Directory to save reports and plots (default is “./reports”).

runsint, optional

Number of executions with a view to averaging them

Returns

List

List of all runs results, matrix with averaged scores and times for all levels

Notes

Runs contamination, imputation, and evaluation, then generates plots and a summary reports.

generate_heatmap(scores_list, algos, sets, save_dir='./reports', display=True)[source]

Generate and save RMSE matrix in HD quality.

Parameters

scores_listnp.ndarray

2D numpy array containing RMSE values.

algoslist of str

List of algorithm names (columns of the heatmap).

setslist of str

List of dataset names (rows of the heatmap).

save_dirstr, optional

Directory to save the generated plot (default is “./reports”).

displaybool, optional

Display or not the plot

Returns

Bool

True if the matrix has been generated

generate_plots(runs_plots_scores, ticks, subplot=False, y_size=4, save_dir='./reports')[source]

Generate and save plots for each metric and pattern based on provided scores.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

tickslist of float

List of missing rates for contamination.

subplotbool, optional

If True, generates a single figure with subplots for all metrics (default is False).

save_dirstr, optional

Directory to save generated plots (default is “./reports”).

Returns

None

Notes

Saves generated plots in save_dir, categorized by dataset, pattern, and metric.

generate_reports_excel(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]

Generate and save an Excel-like text report of metrics and timing for each dataset, algorithm, and pattern.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

save_dirstr, optional

Directory to save the Excel-like file (default is “./reports”).

datasetstr, optional

Name of the data for the Excel-like file name.

runint, optional

Number of the run

Returns

None

generate_reports_txt(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]

Generate and save a text report of metrics and timing for each dataset, algorithm, and pattern.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

save_dirstr, optional

Directory to save the reports file (default is “./reports”).

datasetstr, optional

Name of the data for the report name.

runint, optional

Number of the run.

Returns

None

Notes

The report is saved in a “report.txt” file in save_dir, organized in sections with headers and results.

Submodule Documentation

imputegap.recovery.evaluation module

class imputegap.recovery.benchmark.Benchmark[source]

Bases: object

A class to evaluate the performance of imputation algorithms through benchmarking across datasets and patterns.

Methods

average_runs_by_names(self, data):

Average the results of all runs depending on the dataset.

avg_results():

Calculate average metrics (e.g., RMSE) across multiple datasets and algorithm runs.

generate_heatmap():

Generate and save a heatmap visualization of RMSE scores for datasets and algorithms.

generate_reports_txt():

Create detailed text-based reports summarizing metrics and timing results for all evaluations.

generate_reports_excel():

Create detailed excel-based reports summarizing metrics and timing results for all evaluations.

generate_plots():

Visualize metrics (e.g., RMSE, MAE) and timing (e.g., imputation, optimization) across patterns and datasets.

eval():

Perform a complete benchmarking pipeline, including contamination, imputation, evaluation, and reporting.

Example

output : {‘drift’: {‘mcar’: {‘mean’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9234927128429051, ‘MAE’: 0.7219362152785619, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0010309219360351562, ‘optimization’: 0, ‘imputation’: 0.0005755424499511719}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.9699990038879407, ‘MAE’: 0.7774057495176013, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0020699501037597656, ‘optimization’: 0, ‘imputation’: 0.00048422813415527344}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.9914069853975623, ‘MAE’: 0.8134840739732964, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.007096290588378906, ‘optimization’: 0, ‘imputation’: 0.000461578369140625}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0552448338389784, ‘MAE’: 0.7426695186604741, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.043192148208618164, ‘optimization’: 0, ‘imputation’: 0.0005095005035400391}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0143105930114702, ‘MAE’: 0.7610548321723654, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.17184901237487793, ‘optimization’: 0, ‘imputation’: 0.0005536079406738281}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.010712060535523, ‘MAE’: 0.7641520748788702, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.6064670085906982, ‘optimization’: 0, ‘imputation’: 0.0005743503570556641}}}}, ‘cdrec’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.23303624184873978, ‘MAE’: 0.13619797235197734, ‘MI’: 1.2739817718416822, ‘CORRELATION’: 0.968435455112644}, ‘times’: {‘contamination’: 0.0009615421295166016, ‘optimization’: 0, ‘imputation’: 0.09218788146972656}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.18152059329152104, ‘MAE’: 0.09925566629402761, ‘MI’: 1.1516089897042538, ‘CORRELATION’: 0.9829398352220718}, ‘times’: {‘contamination’: 0.00482487678527832, ‘optimization’: 0, ‘imputation’: 0.09549617767333984}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.13894771223733138, ‘MAE’: 0.08459032692102293, ‘MI’: 1.186191167936035, ‘CORRELATION’: 0.9901338133811375}, ‘times’: {‘contamination’: 0.01713728904724121, ‘optimization’: 0, ‘imputation’: 0.1129295825958252}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.7544523683503829, ‘MAE’: 0.11218049973594252, ‘MI’: 0.021165172206064526, ‘CORRELATION’: 0.814120507570725}, ‘times’: {‘contamination’: 0.10881781578063965, ‘optimization’: 0, ‘imputation’: 1.9378046989440918}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.4355197572001326, ‘MAE’: 0.1380846624733049, ‘MI’: 0.10781252370591506, ‘CORRELATION’: 0.9166777087122915}, ‘times’: {‘contamination’: 0.2380077838897705, ‘optimization’: 0, ‘imputation’: 1.8785057067871094}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.7672558930795506, ‘MAE’: 0.32988968428439397, ‘MI’: 0.013509125598802707, ‘CORRELATION’: 0.7312998041323675}, ‘times’: {‘contamination’: 0.6805167198181152, ‘optimization’: 0, ‘imputation’: 1.9562773704528809}}}}, ‘stmvl’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.5434405584289141, ‘MAE’: 0.346560495723809, ‘MI’: 0.7328867182584357, ‘CORRELATION’: 0.8519431955571422}, ‘times’: {‘contamination’: 0.0022056102752685547, ‘optimization’: 0, ‘imputation’: 52.07010293006897}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.39007056542870916, ‘MAE’: 0.2753022759369617, ‘MI’: 0.8280959876205578, ‘CORRELATION’: 0.9180937736429735}, ‘times’: {‘contamination’: 0.002231597900390625, ‘optimization’: 0, ‘imputation’: 52.543020248413086}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.37254427425455994, ‘MAE’: 0.2730547993858495, ‘MI’: 0.7425412593844177, ‘CORRELATION’: 0.9293322959355041}, ‘times’: {‘contamination’: 0.0072672367095947266, ‘optimization’: 0, ‘imputation’: 52.88247036933899}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.6027573766269363, ‘MAE’: 0.34494332493982044, ‘MI’: 0.11876685901414151, ‘CORRELATION’: 0.8390532279447225}, ‘times’: {‘contamination’: 0.04321551322937012, ‘optimization’: 0, ‘imputation’: 54.10793352127075}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.9004526656857551, ‘MAE’: 0.4924048353228427, ‘MI’: 0.011590260996247858, ‘CORRELATION’: 0.5650541301828254}, ‘times’: {‘contamination’: 0.1728806495666504, ‘optimization’: 0, ‘imputation’: 40.53373336791992}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0112488396023014, ‘MAE’: 0.7646823531588104, ‘MI’: 0.00040669209664367576, ‘CORRELATION’: 0.0183962968474991}, ‘times’: {‘contamination’: 0.6077785491943359, ‘optimization’: 0, ‘imputation’: 35.151907444000244}}}}, ‘iim’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.4445625930776235, ‘MAE’: 0.2696133927362288, ‘MI’: 1.1167751522591498, ‘CORRELATION’: 0.8944975075266335}, ‘times’: {‘contamination’: 0.0010058879852294922, ‘optimization’: 0, ‘imputation’: 0.7380530834197998}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.2939506418814281, ‘MAE’: 0.16953644212278182, ‘MI’: 1.0160968166750064, ‘CORRELATION’: 0.9531900627237018}, ‘times’: {‘contamination’: 0.0019745826721191406, ‘optimization’: 0, ‘imputation’: 4.7826457023620605}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.2366529609250008, ‘MAE’: 0.14709529129218185, ‘MI’: 1.064299483512458, ‘CORRELATION’: 0.9711348247027318}, ‘times’: {‘contamination’: 0.00801849365234375, ‘optimization’: 0, ‘imputation’: 33.94813060760498}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.4155649406397416, ‘MAE’: 0.22056702659999994, ‘MI’: 0.06616526470761779, ‘CORRELATION’: 0.919934494058292}, ‘times’: {‘contamination’: 0.04391813278198242, ‘optimization’: 0, ‘imputation’: 255.31524085998535}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.38695094864012947, ‘MAE’: 0.24340565131372927, ‘MI’: 0.06361822797740405, ‘CORRELATION’: 0.9249744935121553}, ‘times’: {‘contamination’: 0.17044353485107422, ‘optimization’: 0, ‘imputation’: 840.7470128536224}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.5862696375344495, ‘MAE’: 0.3968159514130716, ‘MI’: 0.13422239939628303, ‘CORRELATION’: 0.8178796825899766}, ‘times’: {‘contamination’: 0.5999574661254883, ‘optimization’: 0, ‘imputation’: 1974.6101157665253}}}}, ‘mrnn’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9458508648057621, ‘MAE’: 0.7019459696903068, ‘MI’: 0.11924522547609226, ‘CORRELATION’: 0.02915935932568557}, ‘times’: {‘contamination’: 0.001056671142578125, ‘optimization’: 0, ‘imputation’: 49.42237901687622}}, ‘0.1’: {‘scores’: {‘RMSE’: 1.0125309431502871, ‘MAE’: 0.761136543268339, ‘MI’: 0.12567590499764303, ‘CORRELATION’: -0.037161060882302754}, ‘times’: {‘contamination’: 0.003415822982788086, ‘optimization’: 0, ‘imputation’: 49.04829454421997}}, ‘0.2’: {‘scores’: {‘RMSE’: 1.0317754516097355, ‘MAE’: 0.7952869439926, ‘MI’: 0.10908095436833125, ‘CORRELATION’: -0.04155403791391449}, ‘times’: {‘contamination’: 0.007429599761962891, ‘optimization’: 0, ‘imputation’: 49.42568325996399}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0807965786089415, ‘MAE’: 0.7326965517264863, ‘MI’: 0.006171770470542263, ‘CORRELATION’: -0.020630168509677818}, ‘times’: {‘contamination’: 0.042899370193481445, ‘optimization’: 0, ‘imputation’: 49.479795694351196}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0441472017887297, ‘MAE’: 0.7599852461729673, ‘MI’: 0.01121013333181846, ‘CORRELATION’: -0.007513931343350665}, ‘times’: {‘contamination’: 0.17329692840576172, ‘optimization’: 0, ‘imputation’: 50.439927101135254}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0379347892718205, ‘MAE’: 0.757440007226372, ‘MI’: 0.0035880775657246428, ‘CORRELATION’: -0.0014975078469404196}, ‘times’: {‘contamination’: 0.6166613101959229, ‘optimization’: 0, ‘imputation’: 50.66455388069153}}}}}}}

average_runs_by_names(data)[source]

Average the results of all runs depending on the dataset

Parameters

datalist

list of dictionary containing the results of the benchmark runs.

Returns

list

list of dictionary containing the results of the benchmark runs averaged by datasets.

avg_results(*datasets)[source]

Calculate the average of all metrics and times across multiple datasets.

Parameters

datasetsdict

Multiple dataset dictionaries to be averaged.

Returns

List

Matrix with averaged scores and times for all levels, list of algorithms, list of datasets

eval(algorithms=['cdrec'], datasets=['eeg-alcohol'], patterns=['mcar'], x_axis=[0.05, 0.1, 0.2, 0.4, 0.6, 0.8], optimizers=['user_def'], save_dir='./reports', runs=1)[source]

Execute a comprehensive evaluation of imputation algorithms over multiple datasets and patterns.

Parameters

algorithmslist of str

List of imputation algorithms to test.

datasetslist of str

List of dataset names to evaluate.

patternslist of str

List of contamination patterns to apply.

x_axislist of float

List of missing rates for contamination.

optimizerslist of dict

List of optimizers with their configurations.

save_dirstr, optional

Directory to save reports and plots (default is “./reports”).

runsint, optional

Number of executions with a view to averaging them

Returns

List

List of all runs results, matrix with averaged scores and times for all levels

Notes

Runs contamination, imputation, and evaluation, then generates plots and a summary reports.

generate_heatmap(scores_list, algos, sets, save_dir='./reports', display=True)[source]

Generate and save RMSE matrix in HD quality.

Parameters

scores_listnp.ndarray

2D numpy array containing RMSE values.

algoslist of str

List of algorithm names (columns of the heatmap).

setslist of str

List of dataset names (rows of the heatmap).

save_dirstr, optional

Directory to save the generated plot (default is “./reports”).

displaybool, optional

Display or not the plot

Returns

Bool

True if the matrix has been generated

generate_plots(runs_plots_scores, ticks, subplot=False, y_size=4, save_dir='./reports')[source]

Generate and save plots for each metric and pattern based on provided scores.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

tickslist of float

List of missing rates for contamination.

subplotbool, optional

If True, generates a single figure with subplots for all metrics (default is False).

save_dirstr, optional

Directory to save generated plots (default is “./reports”).

Returns

None

Notes

Saves generated plots in save_dir, categorized by dataset, pattern, and metric.

generate_reports_excel(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]

Generate and save an Excel-like text report of metrics and timing for each dataset, algorithm, and pattern.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

save_dirstr, optional

Directory to save the Excel-like file (default is “./reports”).

datasetstr, optional

Name of the data for the Excel-like file name.

runint, optional

Number of the run

Returns

None

generate_reports_txt(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]

Generate and save a text report of metrics and timing for each dataset, algorithm, and pattern.

Parameters

runs_plots_scoresdict

Dictionary containing scores and timing information for each dataset, pattern, and algorithm.

save_dirstr, optional

Directory to save the reports file (default is “./reports”).

datasetstr, optional

Name of the data for the report name.

runint, optional

Number of the run.

Returns

None

Notes

The report is saved in a “report.txt” file in save_dir, organized in sections with headers and results.