Contributing ============ ImputeGAP allows users to integrate their own algorithms. We describe in turn the integration python and other languages. Initialization ~~~~~~~~~~~~~~ Initialize a Git Repository:: $ git init $ git clone https://github.com/eXascaleInfolab/ImputeGAP $ cd ./ImputeGAP A. Python Integration Steps ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Basic Features -------------- 1. Navigate to the ``./imputegap/algorithms`` directory. 2. Create a new file by copying ``mean_impute.py`` and rename it with the name of your algorithm, e.g., ``new_alg.py``. 3. Rename the function ``def mean_impute()``, e.g., ``def new_alg()``. 4. Replace the section under ``# core of the algorithm`` with your algorithm’s implementation. The algorithms should take as input the ``TimeSeries`` object structure and should return a ``numpy.ndarray`` matrix. 5. Navigate to ``./imputegap/recovery/imputation.py``: a. Copy the ``class MeanImpute(BaseImputer)`` into the corresponding class of algorithms' family. b. Rename the class. e.g., ``class NewAlg(BaseImputer)``. c. Change the value of the ``algorithm`` variable from ``mean_impute`` to ``new_alg`` d. In the ``def impute()`` method, replace the call of the function to link into your new algorithm, e.g., .. code-block:: python from imputegap.algorithms.new_alg import new_alg self.recov_data = new_alg(self.incomp_data, params) .. raw:: html

Advanced Features ----------------- I. Initialize default values ____________________________ 1. To set the default values of your algorithm, please update ``./imputegap/env/default_values.toml`` (lines 3-6) and add your configuration. For example:: [new_alg] param_integer = 42 param_float = 0.42 param_string = "value_42" 2. Update the ``./imputegap/tools/utils.py`` file, and specify your configuration in the ``load_parameters`` function. .. raw:: html
II. Benchmark _____________ To access the benchmarking features, please update ``./imputegap/tools/utils.py`` (lines 547-550) by adding your algorithm in the ``def config_impute_algorithm`` function. .. code-block:: python elif algorithm == "new_alg": imputer = Imputation.MyFamily.NewAlg(incomp_data) Replace MyFamily with either: Statistics, MatrixCompletion, PatternSearch, MachineLearning, DeepLearning, or LLMs. .. raw:: html
III. Optimizer ______________ To enable the optimization module, please update ``./imputegap/tools/algorithm_parameters.py``. 1. Open ``./imputegap/tools/algorithm_parameters.py`` copy paste lines 296 to 302 and update the algorithm name and parameters, e.g., .. code-block:: python 'new_alg': { "param_integer": tune.grid_search([i for i in range(2, 20 1)]), "param_float": tune.loguniform(1e-6, 1), "param_string": ["value_1", "value_n"] }, 2. Add your parameters in the ``def save_optimization()`` function of the file ``./imputegap/tools/utils.py`` to save the optimal parameters, line 808 to 815: .. code-block:: python if algorithm == "new_alg": params_to_save = { "param_integer": int(optimal_params[0]), "param_float": float(optimal_params[1]), "param_string": str(optimal_params[2]) } .. raw:: html
IV. Update the call ___________________ Navigate to ``./imputegap/recovery/imputation.py``: Improve the imputation call of the ``NewAlg`` class in the ``def impute()`` function, and add the call of the optimizer and the default values of the parameters. .. code-block:: python if params is not None: param_integer, param_float, param_string = self._check_params(user_def, params) # call the optimizer else: param_integer, param_float, param_string = utils.load_parameters(query="default", algorithm=self.algorithm, verbose=self.verbose) # load the default values self.recov_data = new_alg(incomp_data=self.incomp_data, param_integer=param_integer, param_float=param_float, param_string=param_string, logs=self.logs, verbose=self.verbose) .. raw:: html

B. C++ Integration Steps ~~~~~~~~~~~~~~~~~~~~~~~~ We provide a wrapper that can serve as a template for the integration of users’ code. We will show how to adjust the wrapper in C++. 1. Navigate to the ``./imputegap/algorithms`` directory. 2. Convert your CPP/H files into a shared object format (``.so``) and place them in the ``imputegap/algorithms/lib`` folder. a. Go to ``./imputegap/wrapper/AlgoCollection`` and update the Makefile. Copy commands from ``libSTMVL.so`` or modify them as needed. b. Optionally, copy your C++ project files into the directory. c. Generate the ``.so`` file using the ``make`` command:: make your_lib_name d. To include the .so file in the "in-built" directory, open a command line, navigate to the root directory, and execute the library build process:: rm -rf dist/ python setup.py sdist bdist_wheel 3. Rename ``cpp_integration.py`` to the name of your algorithm. 4. Modify the ``native_algo()`` function: a. Update the shared object parameter to match your shared library. b. Convert input parameters to the appropriate C++ types and pass them to your shared object methods. c. Convert the imputed matrix back to a numpy format. 5. Adapt the template method ``your_algo.py`` with the appropriate parameters, ensuring compatibility with the ``TimeSeries`` object and a ``numpy.ndarray`` return type. 6. Adapt the ``./imputegap/recovery/imputation.py`` by adding a function to call your new algorithm by copying and modifying ``class MeanImpute(BaseImputer)`` as needed. You can copy-paste the class into the corresponding category of algorithms. 7. Perform imputation as needed. .. raw:: html

Example with C++ Algorithm -------------------------- Once your cpp and h files are ready to be converted (you can look at ``./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.cpp`` or ``./imputegap/wrapper/AlgoCollection/shared/SharedLibCDREC.h``), create a ``.so`` file for linux and windows, and a ``.dylib`` file for MAC OS. .. tabs:: .. tab:: Windows 1. Modify the Makefile:: libCDREC.so: g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \ -Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -lopenblas -larpack 2. Generate the shared library:: make libCDREC.so 3. Place the generated ``.so`` file in ``imputegap/algorithms/lib`` 4. Optional: To include the .so file in the "in-built" directory:: rm -rf dist/ python setup.py sdist bdist_wheel .. raw:: html

.. tab:: Linux 1. Modify the Makefile:: libCDREC.so: g++ -O3 -D ARMA_DONT_USE_WRAPPER -fPIC -rdynamic -shared -o lib_cdrec.so -Wall -Werror -Wextra -pedantic \ -Wconversion -Wsign-conversion -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -fopenmp -std=gnu++14 \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -lopenblas -larpack 2. Generate the shared library:: make libCDREC.so 3. Place the generated ``.so`` file in ``imputegap/algorithms/lib`` 4. Optional: To include the .so file in the "in-built" directory:: rm -rf dist/ python setup.py sdist bdist_wheel .. raw:: html

.. tab:: MacOs 1. Modify the Makefile:: libCDREC.dylib: clang++ -dynamiclib -O3 -fPIC -std=c++17 -o lib_cdrec.dylib \ -I/opt/homebrew/include \ -L/opt/homebrew/lib \ -L/opt/homebrew/opt/openblas/lib \ Stats/Correlation.cpp Algorithms/CDMissingValueRecovery.cpp Algebra/Auxiliary.cpp \ Algebra/CentroidDecomposition.cpp shared/SharedLibCDREC.cpp \ -larmadillo -lopenblas -larpack 2. Generate the shared library:: make libCDREC.dylib 3. Place the generated ``.dylib`` file in ``imputegap/algorithms/lib`` 4. Optional: To include the .dylib file in the "in-built" directory:: rm -rf dist/ python setup.py sdist bdist_wheel .. raw:: html

.. raw:: html

**Wrapper** 1. In ``imputegap/algorithms/cpp_integration.py``, update the function name and parameter count, and ensure the ``.so`` file matches:: def native_cdrec(__py_matrix, __py_rank, __py_epsilon, __py_iterations): shared_lib = utils.load_share_lib("lib_cdrec") # in-build files # shared_lib = utils.load_share_lib("./your_path/lib_cdrec.so") # external files 2. Convert variables to corresponding C++ types:: __py_n = len(__py_matrix); __py_m = len(__py_matrix[0]); assert (__py_rank >= 0); assert (__py_rank < __py_m); assert (__py_epsilon > 0); assert (__py_iterations > 0); __ctype_size_n = __native_c_types_import.c_ulonglong(__py_n); __ctype_size_m = __native_c_types_import.c_ulonglong(__py_m); __ctype_rank = __native_c_types_import.c_ulonglong(__py_rank); __ctype_epsilon = __native_c_types_import.c_double(__py_epsilon); __ctype_iterations = __native_c_types_import.c_ulonglong(__py_iterations); __ctype_matrix = __marshal_as_native_column(__py_matrix); 3. Call the C++ algorithm with the required parameters:: shared_lib.cdrec_imputation_parametrized(__ctype_matrix, __ctype_size_n, __ctype_size_m, __ctype_rank, __ctype_epsilon, __ctype_iterations); 4. Convert the imputed matrix back to ``numpy``:: __py_imputed_matrix = __marshal_as_numpy_column(__ctype_matrix, __py_n, __py_m); return __py_imputed_matrix; .. raw:: html

**Method Implementation** 1. In ``imputegap/algorithms/cpp_integration.py``, create or adapt a generic method for your needs:: def cdrec(contamination, truncation_rank, iterations, epsilon, logs=True, lib_path=None): start_time = time.time() # Record start time # Call the C++ function to perform recovery imputed_matrix = native_cdrec(contamination, truncation_rank, epsilon, iterations) end_time = time.time() if logs: print(f"\n\t\t> logs, imputation cdrec - Execution Time: {(end_time - start_time):.4f} seconds\n") return imputed_matrix .. raw:: html

**Imputer Class** 1. Add your algorithm to the catalog in ``./imputegap/recovery/imputation.py`` 2. Copy and modify ``class MeanImpute(BaseImputer)`` to fit your requirements:: class MatrixCompletion: class CDRec(BaseImputer): algorithm = "cdrec" def impute(self, user_defined=True, params=None): self.imputed_matrix = cdrec(contamination=self.infected_matrix, truncation_rank=rank, iterations=iterations, epsilon=epsilon, logs=self.logs) return self