Author: Petrini Fabrizio Moody Adam Fernandez Juan Frachtenberg Eitan Panda Dhabaleswar K.
Publisher: Inderscience Publishers
ISSN: 1740-0562
Source: International Journal of High Performance Computing and Networking, Vol.4, Iss.3-4, 2006-08, pp. : 122-136
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
Efficient reduction algorithms are crucial to many large-scale, parallel scientific applications. While previous algorithms constrain processing to the host CPU, we explore and utilise the processors in modern cluster Network Interface Cards (NICs). We present the design issues, solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Through experiments on the ALC cluster at Lawrence Livermore National Laboratory, which connects 960 dual-CPU nodes with the Quadrics QsNet interconnect, we find NIC-based reductions to be more efficient than host-based implementations. At large-scale, our NIC-based reductions are more than twice as fast as the host-based, production-level MPI implementation.
Related content
Large-scale ontologies: pattern and partition-based alignment
International Journal of Web Science, Vol. 1, Iss. 1-2, 2011-12 ,pp. :