Contributing
============
ImputeGAP allows users to integrate their own algorithms. We describe in turn the integration python and other languages.
Initialization
~~~~~~~~~~~~~~
Initialize a Git Repository::
$ git init
$ git clone https://github.com/eXascaleInfolab/ImputeGAP
$ cd ./ImputeGAP
A. Python Integration Steps
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Basic Features
--------------
1. Navigate to the ``./imputegap/algorithms`` directory.
2. Create a new file by copying ``mean_impute.py`` and rename it with the name of your algorithm, e.g., ``new_alg.py``.
3. Rename the function ``def mean_impute()``, e.g., ``def new_alg()``.
4. Replace the section under ``# core of the algorithm`` with your algorithm’s implementation. The algorithms should take as input the ``TimeSeries`` object structure and should return a ``numpy.ndarray`` matrix.
5. Navigate to ``./imputegap/recovery/imputation.py``:
a. Copy the ``class MeanImpute(BaseImputer)`` into the corresponding class of algorithms' family.
b. Rename the class. e.g., ``class NewAlg(BaseImputer)``.
c. Change the value of the ``algorithm`` variable from ``mean_impute`` to ``new_alg``
d. In the ``def impute()`` method, replace the call of the function to link into your new algorithm, e.g.,
.. code-block:: python
from imputegap.algorithms.new_alg import new_alg
self.recov_data = new_alg(self.incomp_data, params)
.. raw:: html
Advanced Features
-----------------
I. Initialize default values
____________________________
1. To set the default values of your algorithm, please update ``./imputegap/env/default_values.toml`` (lines 3-6) and add your configuration. For example::
[new_alg]
param_integer = 42
param_float = 0.42
param_string = "value_42"
2. Update the ``./imputegap/tools/utils.py`` file, and specify your configuration in the ``load_parameters`` function.
.. raw:: html
II. Benchmark
_____________
To access the benchmarking features, please update ``./imputegap/tools/utils.py`` (lines 547-550) by adding your algorithm in the ``def config_impute_algorithm`` function.
.. code-block:: python
elif algorithm == "new_alg":
imputer = Imputation.MyFamily.NewAlg(incomp_data)
Replace MyFamily with either: Statistics, MatrixCompletion, PatternSearch, MachineLearning, DeepLearning, or LLMs.
.. raw:: html
III. Optimizer
______________
To enable the optimization module, please update ``./imputegap/tools/algorithm_parameters.py``.
1. Open ``./imputegap/tools/algorithm_parameters.py`` copy paste lines 296 to 302 and update the algorithm name and parameters, e.g.,
.. code-block:: python
'new_alg': {
"param_integer": tune.grid_search([i for i in range(2, 20 1)]),
"param_float": tune.loguniform(1e-6, 1),
"param_string": ["value_1", "value_n"]
},
2. Add your parameters in the ``def save_optimization()`` function of the file ``./imputegap/tools/utils.py`` to save the optimal parameters, line 808 to 815:
.. code-block:: python
if algorithm == "new_alg":
params_to_save = {
"param_integer": int(optimal_params[0]),
"param_float": float(optimal_params[1]),
"param_string": str(optimal_params[2])
}
.. raw:: html
IV. Update the call
___________________
Navigate to ``./imputegap/recovery/imputation.py``:
Improve the imputation call of the ``NewAlg`` class in the ``def impute()`` function, and add the call of the optimizer and the default values of the parameters.
.. code-block:: python
if params is not None:
param_integer, param_float, param_string = self._check_params(user_def, params) # call the optimizer
else:
param_integer, param_float, param_string = utils.load_parameters(query="default", algorithm=self.algorithm, verbose=self.verbose) # load the default values
self.recov_data = new_alg(incomp_data=self.incomp_data, param_integer=param_integer, param_float=param_float, param_string=param_string, logs=self.logs, verbose=self.verbose)
.. raw:: html
B. C++ Integration Steps
~~~~~~~~~~~~~~~~~~~~~~~~
We provide a wrapper that can serve as a template for the integration of users’ code. We will show how to adjust the wrapper in C++.
1. Navigate to the ``./imputegap/algorithms`` directory.
2. Convert your CPP/H files into a shared object format (``.so``) and place them in the ``imputegap/algorithms/lib`` folder.
a. Go to ``./imputegap/wrapper/AlgoCollection`` and update the Makefile. Copy commands from ``libSTMVL.so`` or modify them as needed.
b. Optionally, copy your C++ project files into the directory.
c. Generate the ``.so`` file using the ``make`` command::
make your_lib_name
d. To include the .so file in the "in-built" directory, open a command line, navigate to the root directory, and execute the library build process::
rm -rf dist/
python setup.py sdist bdist_wheel
3. Rename ``cpp_integration.py`` to the name of your algorithm.
4. Modify the ``native_algo()`` function:
a. Update the shared object parameter to match your shared library.
b. Convert input parameters to the appropriate C++ types and pass them to your shared object methods.
c. Convert the imputed matrix back to a numpy format.
5. Adapt the template method ``your_algo.py`` with the appropriate parameters, ensuring compatibility with the ``TimeSeries`` object and a ``numpy.ndarray`` return type.
6. Adapt the ``./imputegap/recovery/imputation.py`` by adding a function to call your new algorithm by copying and modifying ``class MeanImpute(BaseImputer)`` as needed. You can copy-paste the class into the corresponding category of algorithms.
7. Perform imputation as needed.
.. raw:: html
Example with C++ Algorithm
--------------------------
Once your cpp and h files are ready to be converted (you can look at ``./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.cpp`` or ``./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.h``), create a ``.so`` file for linux and windows, and a ``.dylib`` file for MAC OS.
.. tabs::
.. tab:: Windows
1. Modify the Makefile::
libCDREC.so:
g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \
-Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \
Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \
Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \
-lopenblas -larpack
2. Generate the shared library::
make libCDREC.so
3. Place the generated ``.so`` file in ``imputegap/algorithms/lib``
4. Optional: To include the .so file in the "in-built" directory::
rm -rf dist/
python setup.py sdist bdist_wheel
.. raw:: html
.. tab:: Linux
1. Modify the Makefile::
libCDREC.so:
g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \
-Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \
Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \
Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \
-lopenblas -larpack
2. Generate the shared library::
make libCDREC.so
3. Place the generated ``.so`` file in ``imputegap/algorithms/lib``
4. Optional: To include the .so file in the "in-built" directory::
rm -rf dist/
python setup.py sdist bdist_wheel
.. raw:: html
.. tab:: MacOs
1. Modify the Makefile::
libCDREC.dylib:
clang++ -dynamiclib -O3 -fPIC -std=c++17 -o lib_cdrec.dylib \
-I/opt/homebrew/include \
-L/opt/homebrew/lib \
-L/opt/homebrew/opt/openblas/lib \
Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \
Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \
-larmadillo -lopenblas -larpack
2. Generate the shared library::
make libCDREC.dylib
3. Place the generated ``.dylib`` file in ``imputegap/algorithms/lib``
4. Optional: To include the .dylib file in the "in-built" directory::
rm -rf dist/
python setup.py sdist bdist_wheel
.. raw:: html
.. raw:: html
**Wrapper**
1. In ``imputegap/algorithms/cpp_integration.py``, update the function name and parameter count, and ensure the ``.so`` file matches::
def native_cdrec(__py_matrix, __py_rank, __py_epsilon, __py_iterations):
shared_lib = utils.load_share_lib("lib_cdrec") # in-build files
# shared_lib = utils.load_share_lib("./your_path/lib_cdrec.so") # external files
2. Convert variables to corresponding C++ types::
__py_n = len(__py_matrix);
__py_m = len(__py_matrix[0]);
assert (__py_rank >= 0);
assert (__py_rank < __py_m);
assert (__py_epsilon > 0);
assert (__py_iterations > 0);
__ctype_size_n = __native_c_types_import.c_ulonglong(__py_n);
__ctype_size_m = __native_c_types_import.c_ulonglong(__py_m);
__ctype_rank = __native_c_types_import.c_ulonglong(__py_rank);
__ctype_epsilon = __native_c_types_import.c_double(__py_epsilon);
__ctype_iterations = __native_c_types_import.c_ulonglong(__py_iterations);
__ctype_matrix = __marshal_as_native_column(__py_matrix);
3. Call the C++ algorithm with the required parameters::
shared_lib.cdrec_imputation_parametrized(__ctype_matrix, __ctype_size_n, __ctype_size_m, __ctype_rank, __ctype_epsilon, __ctype_iterations);
4. Convert the imputed matrix back to ``numpy``::
__py_imputed_matrix = __marshal_as_numpy_column(__ctype_matrix, __py_n, __py_m);
return __py_imputed_matrix;
.. raw:: html
**Method Implementation**
1. In ``imputegap/algorithms/cpp_integration.py``, create or adapt a generic method for your needs::
def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True, lib_path=None):
start_time = time.time() # Record start time
# Call the C++ function to perform recovery
imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations)
end_time = time.time()
if logs:
print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n")
return imputed_matrix
.. raw:: html
**Imputer Class**
1. Add your algorithm to the catalog in ``./imputegap/recovery/imputation.py``
2. Copy and modify ``class MeanImpute(BaseImputer)`` to fit your requirements::
class MatrixCompletion:
class CDRec(BaseImputer):
algorithm = "cdrec"
def impute(self, user_defined=True, params=None):
self.imputed_matrix = cdrec(contamination=self.infected_matrix, truncation_rank=rank, iterations=iterations, epsilon=epsilon, logs=self.logs)
return self