Researchers funded by the Exascale Computing Project have developed a new algorithm that improves kernel ridge regression methods used for supervised learning problems. The low-rank compression algorithm, which exploits both shared-memory and distributed-memory inter-node parallelism to alleviate the training bottleneck, contributes to an exascale-capable ecosystem. It requires much less memory and makes exascale-size problems more efficient with its use of linear scaling; traditional kernel ridge regression methods typically feature cubic scaling.
The research team compared the algorithm with a state-of-the-art near-optimal Nyström-based method and a similar approach requiring an intermediate H representation, which was removed by our faster nearest-neighbors approximation. Numerical experiments in a distributed-memory environment returned reduced time to solution.
The researchers also developed a Python interface to scikit-learn, a widely used machine learning tool for classification and regression. The interface allows scikit-learn to leverage a high-performance solver library to achieve performance and memory footprint improvements and makes the researchers’ novel algorithm available to all users.
Chavez, Gustavo, Yang Liu, Pieter Ghysels, Xiaoye Sherry Li, and Elizaveta Rebrova. 2020. “Scalable and Memory-Efficient Kernel Ridge Regression.” 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (May). doi:10.1109/ipdps47924.2020.00102. http://dx.doi.org/10.1109/IPDPS47924.2020.00102.