imputegap.recovery.benchmark package¶
The imputegap.recovery.benchmark package provides various utility functions and tools for test the library
Submodules¶
Modules¶
- class imputegap.recovery.benchmark.Benchmark[source]¶
Bases:
object
A class to evaluate the performance of imputation algorithms through benchmarking across datasets and patterns.
Methods¶
- average_runs_by_names(self, data):
Average the results of all runs depending on the dataset.
- avg_results():
Calculate average metrics (e.g., RMSE) across multiple datasets and algorithm runs.
- generate_heatmap():
Generate and save a heatmap visualization of RMSE scores for datasets and algorithms.
- generate_reports_txt():
Create detailed text-based reports summarizing metrics and timing results for all evaluations.
- generate_reports_excel():
Create detailed excel-based reports summarizing metrics and timing results for all evaluations.
- generate_plots():
Visualize metrics (e.g., RMSE, MAE) and timing (e.g., imputation, optimization) across patterns and datasets.
- eval():
Perform a complete benchmarking pipeline, including contamination, imputation, evaluation, and reporting.
Example¶
output : {‘drift’: {‘mcar’: {‘mean’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9234927128429051, ‘MAE’: 0.7219362152785619, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0010309219360351562, ‘optimization’: 0, ‘imputation’: 0.0005755424499511719}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.9699990038879407, ‘MAE’: 0.7774057495176013, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0020699501037597656, ‘optimization’: 0, ‘imputation’: 0.00048422813415527344}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.9914069853975623, ‘MAE’: 0.8134840739732964, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.007096290588378906, ‘optimization’: 0, ‘imputation’: 0.000461578369140625}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0552448338389784, ‘MAE’: 0.7426695186604741, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.043192148208618164, ‘optimization’: 0, ‘imputation’: 0.0005095005035400391}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0143105930114702, ‘MAE’: 0.7610548321723654, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.17184901237487793, ‘optimization’: 0, ‘imputation’: 0.0005536079406738281}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.010712060535523, ‘MAE’: 0.7641520748788702, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.6064670085906982, ‘optimization’: 0, ‘imputation’: 0.0005743503570556641}}}}, ‘cdrec’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.23303624184873978, ‘MAE’: 0.13619797235197734, ‘MI’: 1.2739817718416822, ‘CORRELATION’: 0.968435455112644}, ‘times’: {‘contamination’: 0.0009615421295166016, ‘optimization’: 0, ‘imputation’: 0.09218788146972656}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.18152059329152104, ‘MAE’: 0.09925566629402761, ‘MI’: 1.1516089897042538, ‘CORRELATION’: 0.9829398352220718}, ‘times’: {‘contamination’: 0.00482487678527832, ‘optimization’: 0, ‘imputation’: 0.09549617767333984}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.13894771223733138, ‘MAE’: 0.08459032692102293, ‘MI’: 1.186191167936035, ‘CORRELATION’: 0.9901338133811375}, ‘times’: {‘contamination’: 0.01713728904724121, ‘optimization’: 0, ‘imputation’: 0.1129295825958252}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.7544523683503829, ‘MAE’: 0.11218049973594252, ‘MI’: 0.021165172206064526, ‘CORRELATION’: 0.814120507570725}, ‘times’: {‘contamination’: 0.10881781578063965, ‘optimization’: 0, ‘imputation’: 1.9378046989440918}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.4355197572001326, ‘MAE’: 0.1380846624733049, ‘MI’: 0.10781252370591506, ‘CORRELATION’: 0.9166777087122915}, ‘times’: {‘contamination’: 0.2380077838897705, ‘optimization’: 0, ‘imputation’: 1.8785057067871094}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.7672558930795506, ‘MAE’: 0.32988968428439397, ‘MI’: 0.013509125598802707, ‘CORRELATION’: 0.7312998041323675}, ‘times’: {‘contamination’: 0.6805167198181152, ‘optimization’: 0, ‘imputation’: 1.9562773704528809}}}}, ‘stmvl’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.5434405584289141, ‘MAE’: 0.346560495723809, ‘MI’: 0.7328867182584357, ‘CORRELATION’: 0.8519431955571422}, ‘times’: {‘contamination’: 0.0022056102752685547, ‘optimization’: 0, ‘imputation’: 52.07010293006897}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.39007056542870916, ‘MAE’: 0.2753022759369617, ‘MI’: 0.8280959876205578, ‘CORRELATION’: 0.9180937736429735}, ‘times’: {‘contamination’: 0.002231597900390625, ‘optimization’: 0, ‘imputation’: 52.543020248413086}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.37254427425455994, ‘MAE’: 0.2730547993858495, ‘MI’: 0.7425412593844177, ‘CORRELATION’: 0.9293322959355041}, ‘times’: {‘contamination’: 0.0072672367095947266, ‘optimization’: 0, ‘imputation’: 52.88247036933899}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.6027573766269363, ‘MAE’: 0.34494332493982044, ‘MI’: 0.11876685901414151, ‘CORRELATION’: 0.8390532279447225}, ‘times’: {‘contamination’: 0.04321551322937012, ‘optimization’: 0, ‘imputation’: 54.10793352127075}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.9004526656857551, ‘MAE’: 0.4924048353228427, ‘MI’: 0.011590260996247858, ‘CORRELATION’: 0.5650541301828254}, ‘times’: {‘contamination’: 0.1728806495666504, ‘optimization’: 0, ‘imputation’: 40.53373336791992}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0112488396023014, ‘MAE’: 0.7646823531588104, ‘MI’: 0.00040669209664367576, ‘CORRELATION’: 0.0183962968474991}, ‘times’: {‘contamination’: 0.6077785491943359, ‘optimization’: 0, ‘imputation’: 35.151907444000244}}}}, ‘iim’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.4445625930776235, ‘MAE’: 0.2696133927362288, ‘MI’: 1.1167751522591498, ‘CORRELATION’: 0.8944975075266335}, ‘times’: {‘contamination’: 0.0010058879852294922, ‘optimization’: 0, ‘imputation’: 0.7380530834197998}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.2939506418814281, ‘MAE’: 0.16953644212278182, ‘MI’: 1.0160968166750064, ‘CORRELATION’: 0.9531900627237018}, ‘times’: {‘contamination’: 0.0019745826721191406, ‘optimization’: 0, ‘imputation’: 4.7826457023620605}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.2366529609250008, ‘MAE’: 0.14709529129218185, ‘MI’: 1.064299483512458, ‘CORRELATION’: 0.9711348247027318}, ‘times’: {‘contamination’: 0.00801849365234375, ‘optimization’: 0, ‘imputation’: 33.94813060760498}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.4155649406397416, ‘MAE’: 0.22056702659999994, ‘MI’: 0.06616526470761779, ‘CORRELATION’: 0.919934494058292}, ‘times’: {‘contamination’: 0.04391813278198242, ‘optimization’: 0, ‘imputation’: 255.31524085998535}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.38695094864012947, ‘MAE’: 0.24340565131372927, ‘MI’: 0.06361822797740405, ‘CORRELATION’: 0.9249744935121553}, ‘times’: {‘contamination’: 0.17044353485107422, ‘optimization’: 0, ‘imputation’: 840.7470128536224}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.5862696375344495, ‘MAE’: 0.3968159514130716, ‘MI’: 0.13422239939628303, ‘CORRELATION’: 0.8178796825899766}, ‘times’: {‘contamination’: 0.5999574661254883, ‘optimization’: 0, ‘imputation’: 1974.6101157665253}}}}, ‘mrnn’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9458508648057621, ‘MAE’: 0.7019459696903068, ‘MI’: 0.11924522547609226, ‘CORRELATION’: 0.02915935932568557}, ‘times’: {‘contamination’: 0.001056671142578125, ‘optimization’: 0, ‘imputation’: 49.42237901687622}}, ‘0.1’: {‘scores’: {‘RMSE’: 1.0125309431502871, ‘MAE’: 0.761136543268339, ‘MI’: 0.12567590499764303, ‘CORRELATION’: -0.037161060882302754}, ‘times’: {‘contamination’: 0.003415822982788086, ‘optimization’: 0, ‘imputation’: 49.04829454421997}}, ‘0.2’: {‘scores’: {‘RMSE’: 1.0317754516097355, ‘MAE’: 0.7952869439926, ‘MI’: 0.10908095436833125, ‘CORRELATION’: -0.04155403791391449}, ‘times’: {‘contamination’: 0.007429599761962891, ‘optimization’: 0, ‘imputation’: 49.42568325996399}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0807965786089415, ‘MAE’: 0.7326965517264863, ‘MI’: 0.006171770470542263, ‘CORRELATION’: -0.020630168509677818}, ‘times’: {‘contamination’: 0.042899370193481445, ‘optimization’: 0, ‘imputation’: 49.479795694351196}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0441472017887297, ‘MAE’: 0.7599852461729673, ‘MI’: 0.01121013333181846, ‘CORRELATION’: -0.007513931343350665}, ‘times’: {‘contamination’: 0.17329692840576172, ‘optimization’: 0, ‘imputation’: 50.439927101135254}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0379347892718205, ‘MAE’: 0.757440007226372, ‘MI’: 0.0035880775657246428, ‘CORRELATION’: -0.0014975078469404196}, ‘times’: {‘contamination’: 0.6166613101959229, ‘optimization’: 0, ‘imputation’: 50.66455388069153}}}}}}}
- average_runs_by_names(data)[source]¶
Average the results of all runs depending on the dataset
Parameters¶
- datalist
list of dictionary containing the results of the benchmark runs.
Returns¶
- list
list of dictionary containing the results of the benchmark runs averaged by datasets.
- avg_results(*datasets)[source]¶
Calculate the average of all metrics and times across multiple datasets.
Parameters¶
- datasetsdict
Multiple dataset dictionaries to be averaged.
Returns¶
- List
Matrix with averaged scores and times for all levels, list of algorithms, list of datasets
- eval(algorithms=['cdrec'], datasets=['eeg-alcohol'], patterns=['mcar'], x_axis=[0.05, 0.1, 0.2, 0.4, 0.6, 0.8], optimizers=['user_def'], save_dir='./reports', runs=1)[source]¶
Execute a comprehensive evaluation of imputation algorithms over multiple datasets and patterns.
Parameters¶
- algorithmslist of str
List of imputation algorithms to test.
- datasetslist of str
List of dataset names to evaluate.
- patternslist of str
List of contamination patterns to apply.
- x_axislist of float
List of missing rates for contamination.
- optimizerslist of dict
List of optimizers with their configurations.
- save_dirstr, optional
Directory to save reports and plots (default is “./reports”).
- runsint, optional
Number of executions with a view to averaging them
Returns¶
- List
List of all runs results, matrix with averaged scores and times for all levels
Notes¶
Runs contamination, imputation, and evaluation, then generates plots and a summary reports.
- generate_heatmap(scores_list, algos, sets, save_dir='./reports', display=True)[source]¶
Generate and save RMSE matrix in HD quality.
Parameters¶
- scores_listnp.ndarray
2D numpy array containing RMSE values.
- algoslist of str
List of algorithm names (columns of the heatmap).
- setslist of str
List of dataset names (rows of the heatmap).
- save_dirstr, optional
Directory to save the generated plot (default is “./reports”).
- displaybool, optional
Display or not the plot
Returns¶
- Bool
True if the matrix has been generated
- generate_plots(runs_plots_scores, ticks, subplot=False, y_size=4, save_dir='./reports')[source]¶
Generate and save plots for each metric and pattern based on provided scores.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- tickslist of float
List of missing rates for contamination.
- subplotbool, optional
If True, generates a single figure with subplots for all metrics (default is False).
- save_dirstr, optional
Directory to save generated plots (default is “./reports”).
Returns¶
None
Notes¶
Saves generated plots in save_dir, categorized by dataset, pattern, and metric.
- generate_reports_excel(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]¶
Generate and save an Excel-like text report of metrics and timing for each dataset, algorithm, and pattern.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- save_dirstr, optional
Directory to save the Excel-like file (default is “./reports”).
- datasetstr, optional
Name of the data for the Excel-like file name.
- runint, optional
Number of the run
Returns¶
None
- generate_reports_txt(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]¶
Generate and save a text report of metrics and timing for each dataset, algorithm, and pattern.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- save_dirstr, optional
Directory to save the reports file (default is “./reports”).
- datasetstr, optional
Name of the data for the report name.
- runint, optional
Number of the run.
Returns¶
None
Notes¶
The report is saved in a “report.txt” file in save_dir, organized in sections with headers and results.
Submodule Documentation¶
imputegap.recovery.evaluation module¶
- class imputegap.recovery.benchmark.Benchmark[source]¶
Bases:
object
A class to evaluate the performance of imputation algorithms through benchmarking across datasets and patterns.
Methods¶
- average_runs_by_names(self, data):
Average the results of all runs depending on the dataset.
- avg_results():
Calculate average metrics (e.g., RMSE) across multiple datasets and algorithm runs.
- generate_heatmap():
Generate and save a heatmap visualization of RMSE scores for datasets and algorithms.
- generate_reports_txt():
Create detailed text-based reports summarizing metrics and timing results for all evaluations.
- generate_reports_excel():
Create detailed excel-based reports summarizing metrics and timing results for all evaluations.
- generate_plots():
Visualize metrics (e.g., RMSE, MAE) and timing (e.g., imputation, optimization) across patterns and datasets.
- eval():
Perform a complete benchmarking pipeline, including contamination, imputation, evaluation, and reporting.
Example¶
output : {‘drift’: {‘mcar’: {‘mean’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9234927128429051, ‘MAE’: 0.7219362152785619, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0010309219360351562, ‘optimization’: 0, ‘imputation’: 0.0005755424499511719}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.9699990038879407, ‘MAE’: 0.7774057495176013, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.0020699501037597656, ‘optimization’: 0, ‘imputation’: 0.00048422813415527344}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.9914069853975623, ‘MAE’: 0.8134840739732964, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.007096290588378906, ‘optimization’: 0, ‘imputation’: 0.000461578369140625}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0552448338389784, ‘MAE’: 0.7426695186604741, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.043192148208618164, ‘optimization’: 0, ‘imputation’: 0.0005095005035400391}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0143105930114702, ‘MAE’: 0.7610548321723654, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.17184901237487793, ‘optimization’: 0, ‘imputation’: 0.0005536079406738281}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.010712060535523, ‘MAE’: 0.7641520748788702, ‘MI’: 0.0, ‘CORRELATION’: 0}, ‘times’: {‘contamination’: 0.6064670085906982, ‘optimization’: 0, ‘imputation’: 0.0005743503570556641}}}}, ‘cdrec’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.23303624184873978, ‘MAE’: 0.13619797235197734, ‘MI’: 1.2739817718416822, ‘CORRELATION’: 0.968435455112644}, ‘times’: {‘contamination’: 0.0009615421295166016, ‘optimization’: 0, ‘imputation’: 0.09218788146972656}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.18152059329152104, ‘MAE’: 0.09925566629402761, ‘MI’: 1.1516089897042538, ‘CORRELATION’: 0.9829398352220718}, ‘times’: {‘contamination’: 0.00482487678527832, ‘optimization’: 0, ‘imputation’: 0.09549617767333984}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.13894771223733138, ‘MAE’: 0.08459032692102293, ‘MI’: 1.186191167936035, ‘CORRELATION’: 0.9901338133811375}, ‘times’: {‘contamination’: 0.01713728904724121, ‘optimization’: 0, ‘imputation’: 0.1129295825958252}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.7544523683503829, ‘MAE’: 0.11218049973594252, ‘MI’: 0.021165172206064526, ‘CORRELATION’: 0.814120507570725}, ‘times’: {‘contamination’: 0.10881781578063965, ‘optimization’: 0, ‘imputation’: 1.9378046989440918}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.4355197572001326, ‘MAE’: 0.1380846624733049, ‘MI’: 0.10781252370591506, ‘CORRELATION’: 0.9166777087122915}, ‘times’: {‘contamination’: 0.2380077838897705, ‘optimization’: 0, ‘imputation’: 1.8785057067871094}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.7672558930795506, ‘MAE’: 0.32988968428439397, ‘MI’: 0.013509125598802707, ‘CORRELATION’: 0.7312998041323675}, ‘times’: {‘contamination’: 0.6805167198181152, ‘optimization’: 0, ‘imputation’: 1.9562773704528809}}}}, ‘stmvl’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.5434405584289141, ‘MAE’: 0.346560495723809, ‘MI’: 0.7328867182584357, ‘CORRELATION’: 0.8519431955571422}, ‘times’: {‘contamination’: 0.0022056102752685547, ‘optimization’: 0, ‘imputation’: 52.07010293006897}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.39007056542870916, ‘MAE’: 0.2753022759369617, ‘MI’: 0.8280959876205578, ‘CORRELATION’: 0.9180937736429735}, ‘times’: {‘contamination’: 0.002231597900390625, ‘optimization’: 0, ‘imputation’: 52.543020248413086}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.37254427425455994, ‘MAE’: 0.2730547993858495, ‘MI’: 0.7425412593844177, ‘CORRELATION’: 0.9293322959355041}, ‘times’: {‘contamination’: 0.0072672367095947266, ‘optimization’: 0, ‘imputation’: 52.88247036933899}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.6027573766269363, ‘MAE’: 0.34494332493982044, ‘MI’: 0.11876685901414151, ‘CORRELATION’: 0.8390532279447225}, ‘times’: {‘contamination’: 0.04321551322937012, ‘optimization’: 0, ‘imputation’: 54.10793352127075}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.9004526656857551, ‘MAE’: 0.4924048353228427, ‘MI’: 0.011590260996247858, ‘CORRELATION’: 0.5650541301828254}, ‘times’: {‘contamination’: 0.1728806495666504, ‘optimization’: 0, ‘imputation’: 40.53373336791992}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0112488396023014, ‘MAE’: 0.7646823531588104, ‘MI’: 0.00040669209664367576, ‘CORRELATION’: 0.0183962968474991}, ‘times’: {‘contamination’: 0.6077785491943359, ‘optimization’: 0, ‘imputation’: 35.151907444000244}}}}, ‘iim’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.4445625930776235, ‘MAE’: 0.2696133927362288, ‘MI’: 1.1167751522591498, ‘CORRELATION’: 0.8944975075266335}, ‘times’: {‘contamination’: 0.0010058879852294922, ‘optimization’: 0, ‘imputation’: 0.7380530834197998}}, ‘0.1’: {‘scores’: {‘RMSE’: 0.2939506418814281, ‘MAE’: 0.16953644212278182, ‘MI’: 1.0160968166750064, ‘CORRELATION’: 0.9531900627237018}, ‘times’: {‘contamination’: 0.0019745826721191406, ‘optimization’: 0, ‘imputation’: 4.7826457023620605}}, ‘0.2’: {‘scores’: {‘RMSE’: 0.2366529609250008, ‘MAE’: 0.14709529129218185, ‘MI’: 1.064299483512458, ‘CORRELATION’: 0.9711348247027318}, ‘times’: {‘contamination’: 0.00801849365234375, ‘optimization’: 0, ‘imputation’: 33.94813060760498}}, ‘0.4’: {‘scores’: {‘RMSE’: 0.4155649406397416, ‘MAE’: 0.22056702659999994, ‘MI’: 0.06616526470761779, ‘CORRELATION’: 0.919934494058292}, ‘times’: {‘contamination’: 0.04391813278198242, ‘optimization’: 0, ‘imputation’: 255.31524085998535}}, ‘0.6’: {‘scores’: {‘RMSE’: 0.38695094864012947, ‘MAE’: 0.24340565131372927, ‘MI’: 0.06361822797740405, ‘CORRELATION’: 0.9249744935121553}, ‘times’: {‘contamination’: 0.17044353485107422, ‘optimization’: 0, ‘imputation’: 840.7470128536224}}, ‘0.8’: {‘scores’: {‘RMSE’: 0.5862696375344495, ‘MAE’: 0.3968159514130716, ‘MI’: 0.13422239939628303, ‘CORRELATION’: 0.8178796825899766}, ‘times’: {‘contamination’: 0.5999574661254883, ‘optimization’: 0, ‘imputation’: 1974.6101157665253}}}}, ‘mrnn’: {‘bayesian’: {‘0.05’: {‘scores’: {‘RMSE’: 0.9458508648057621, ‘MAE’: 0.7019459696903068, ‘MI’: 0.11924522547609226, ‘CORRELATION’: 0.02915935932568557}, ‘times’: {‘contamination’: 0.001056671142578125, ‘optimization’: 0, ‘imputation’: 49.42237901687622}}, ‘0.1’: {‘scores’: {‘RMSE’: 1.0125309431502871, ‘MAE’: 0.761136543268339, ‘MI’: 0.12567590499764303, ‘CORRELATION’: -0.037161060882302754}, ‘times’: {‘contamination’: 0.003415822982788086, ‘optimization’: 0, ‘imputation’: 49.04829454421997}}, ‘0.2’: {‘scores’: {‘RMSE’: 1.0317754516097355, ‘MAE’: 0.7952869439926, ‘MI’: 0.10908095436833125, ‘CORRELATION’: -0.04155403791391449}, ‘times’: {‘contamination’: 0.007429599761962891, ‘optimization’: 0, ‘imputation’: 49.42568325996399}}, ‘0.4’: {‘scores’: {‘RMSE’: 1.0807965786089415, ‘MAE’: 0.7326965517264863, ‘MI’: 0.006171770470542263, ‘CORRELATION’: -0.020630168509677818}, ‘times’: {‘contamination’: 0.042899370193481445, ‘optimization’: 0, ‘imputation’: 49.479795694351196}}, ‘0.6’: {‘scores’: {‘RMSE’: 1.0441472017887297, ‘MAE’: 0.7599852461729673, ‘MI’: 0.01121013333181846, ‘CORRELATION’: -0.007513931343350665}, ‘times’: {‘contamination’: 0.17329692840576172, ‘optimization’: 0, ‘imputation’: 50.439927101135254}}, ‘0.8’: {‘scores’: {‘RMSE’: 1.0379347892718205, ‘MAE’: 0.757440007226372, ‘MI’: 0.0035880775657246428, ‘CORRELATION’: -0.0014975078469404196}, ‘times’: {‘contamination’: 0.6166613101959229, ‘optimization’: 0, ‘imputation’: 50.66455388069153}}}}}}}
- average_runs_by_names(data)[source]¶
Average the results of all runs depending on the dataset
Parameters¶
- datalist
list of dictionary containing the results of the benchmark runs.
Returns¶
- list
list of dictionary containing the results of the benchmark runs averaged by datasets.
- avg_results(*datasets)[source]¶
Calculate the average of all metrics and times across multiple datasets.
Parameters¶
- datasetsdict
Multiple dataset dictionaries to be averaged.
Returns¶
- List
Matrix with averaged scores and times for all levels, list of algorithms, list of datasets
- eval(algorithms=['cdrec'], datasets=['eeg-alcohol'], patterns=['mcar'], x_axis=[0.05, 0.1, 0.2, 0.4, 0.6, 0.8], optimizers=['user_def'], save_dir='./reports', runs=1)[source]¶
Execute a comprehensive evaluation of imputation algorithms over multiple datasets and patterns.
Parameters¶
- algorithmslist of str
List of imputation algorithms to test.
- datasetslist of str
List of dataset names to evaluate.
- patternslist of str
List of contamination patterns to apply.
- x_axislist of float
List of missing rates for contamination.
- optimizerslist of dict
List of optimizers with their configurations.
- save_dirstr, optional
Directory to save reports and plots (default is “./reports”).
- runsint, optional
Number of executions with a view to averaging them
Returns¶
- List
List of all runs results, matrix with averaged scores and times for all levels
Notes¶
Runs contamination, imputation, and evaluation, then generates plots and a summary reports.
- generate_heatmap(scores_list, algos, sets, save_dir='./reports', display=True)[source]¶
Generate and save RMSE matrix in HD quality.
Parameters¶
- scores_listnp.ndarray
2D numpy array containing RMSE values.
- algoslist of str
List of algorithm names (columns of the heatmap).
- setslist of str
List of dataset names (rows of the heatmap).
- save_dirstr, optional
Directory to save the generated plot (default is “./reports”).
- displaybool, optional
Display or not the plot
Returns¶
- Bool
True if the matrix has been generated
- generate_plots(runs_plots_scores, ticks, subplot=False, y_size=4, save_dir='./reports')[source]¶
Generate and save plots for each metric and pattern based on provided scores.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- tickslist of float
List of missing rates for contamination.
- subplotbool, optional
If True, generates a single figure with subplots for all metrics (default is False).
- save_dirstr, optional
Directory to save generated plots (default is “./reports”).
Returns¶
None
Notes¶
Saves generated plots in save_dir, categorized by dataset, pattern, and metric.
- generate_reports_excel(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]¶
Generate and save an Excel-like text report of metrics and timing for each dataset, algorithm, and pattern.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- save_dirstr, optional
Directory to save the Excel-like file (default is “./reports”).
- datasetstr, optional
Name of the data for the Excel-like file name.
- runint, optional
Number of the run
Returns¶
None
- generate_reports_txt(runs_plots_scores, save_dir='./reports', dataset='', run=-1)[source]¶
Generate and save a text report of metrics and timing for each dataset, algorithm, and pattern.
Parameters¶
- runs_plots_scoresdict
Dictionary containing scores and timing information for each dataset, pattern, and algorithm.
- save_dirstr, optional
Directory to save the reports file (default is “./reports”).
- datasetstr, optional
Name of the data for the report name.
- runint, optional
Number of the run.
Returns¶
None
Notes¶
The report is saved in a “report.txt” file in save_dir, organized in sections with headers and results.