NIC-based reduction algorithms for large-scale clusters

Author： Petrini Fabrizio Moody Adam Fernandez Juan Frachtenberg Eitan Panda Dhabaleswar K.

ISSN： 1740-0562

Source： International Journal of High Performance Computing and Networking, Vol.4, Iss.3-4, 2006-08, pp. : 122-136

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Efficient reduction algorithms are crucial to many large-scale, parallel scientific applications. While previous algorithms constrain processing to the host CPU, we explore and utilise the processors in modern cluster Network Interface Cards (NICs). We present the design issues, solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Through experiments on the ALC cluster at Lawrence Livermore National Laboratory, which connects 960 dual-CPU nodes with the Quadrics QsNet interconnect, we find NIC-based reductions to be more efficient than host-based implementations. At large-scale, our NIC-based reductions are more than twice as fast as the host-based, production-level MPI implementation.