Compiling Affine Nested Loops: How to Optimize the Residual Communications after the Alignment Phase

Author： Dion M. Randriamaro C. Robert Y.

ISSN： 0743-7315

Source： Journal of Parallel and Distributed Computing, Vol.38, Iss.2, 1996-11, pp. : 176-187

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Minimizing communication overhead when mapping affine loop nests onto distributed memory parallel computers (DMPCs) is a key problem with regard to performance, and many authors have dealt with it. All communications are not equivalent. Local communications (translations), simple communications (horizontal or vertical ones), or structured communications (broadcasts, gathers, scatters, or reductions) are performed much faster than general affine communications onto DMPCs. In this paper, we recall the mapping heuristic given by Dion and Robert which consists in minimizing the number of nonlocal communications and we focus on the next step; as it is generally impossible to obtain a communication local mapping, we show how to optimize residual general communications using structured communications or decompositions into small sequences of simple communications.