Parallel Computing: From Multicores and GPU's to Petascale ( Advances in Parallel Computing )

Publication series ： Advances in Parallel Computing

Author： Chapman B.;Desprez F.;Joubert G.R.

Publisher： Ios Press‎

Publication year： 2010

E-ISBN: 9781607505303

P-ISBN(Paperback): 9781607505297

Subject： TP338.6 parallel computer

Keyword：计算技术、计算机技术

Language： ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Description

Parallel computing technologies have brought dramatic changes to mainstream computing; the majority of today’s PC's, laptops and even notebooks incorporate multiprocessor chips with up to four processors. Standard components are increasingly combined with GPU's (Graphics Processing Unit), originally designed for high-speed graphics processing, and FPGA's (Free Programmable Gate Array) to build parallel computers with a wide spectrum of high-speed processing functions. The scale of this powerful hardware is limited only by factors such as energy consumption and thermal control. However, in addition to hardware factors, the practical use of petascale and exascale machines is often hampered by the difficulty of developing software which will run effectively and efficiently on such architecture. This book includes selected and refereed papers, presented at the 2009 international Parallel Computing conference (ParCo2009), which set out to address these problems. It provides a snapshot of the state-of-the-art of parallel computing technologies in hardware, application and software development. Areas covered include: numerical algorithms, grid and cloud computing, programming – including GPU and cell programming. The book also includes papers presented at the six mini-symposia held at the conference.

Chapter

Title page

Preface

Conference Organization

ParCo2009 Sponsors

Contents

Invited Talks

Exascale Computing: What Future Architectures Will Mean for the User Community

Making Multi-Cores Mainstream – From Security to Scalability

Numerical Algorithms

Efficiency and Scalability of the Parallel Barnes-Hut Tree Code PEPC

Combining Numerical Iterative Solvers

A Comparative Study of Some Distributed Linear Solvers on Systems Arising from Fluid Dynamics Simulations

Gradient Projection Methods for Image Deblurring and Denoising on Graphics Processors

Parallel Simulations of Seismic Wave Propagation on NUMA Architectures

Aitken-Schwarz and Schur Complement Methods for Time Domain Decomposition

Performance Modeling Tools for Parallel Sparse Linear Algebra Computations

Narrow-Band Reduction Approach of a DRSM Eigensolver on a Multicore-Based Cluster System

Parallel Multistage Preconditioners by Extended Hierarchical Interface Decomposition for Ill-Conditioned Problems

A Comparison of Different Communication Structures for Scalable Parallel Three Dimensional FFTs in First Principles Codes

Parallelization Strategies for ODE Solvers on Multicore Cluster Systems

Evaluation of Parallel Sparse Matrix Partitioning Software for Parallel Multilevel ILU Preconditioning on Shared-Memory Multiprocessors

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

Bio-Informatics

A Data-Flow Modification of the MUSCLE Algorithm for Multiprocessors and a Web Interface for It

A Parallel Algorithm for the Fixed-Length Approximate String Matching Problem for High Throughput Sequencing Technologies

Computing Alignment Plots Efficiently

Image Processing & Visualisation

Parallelizing the LM OSEM Image Reconstruction on Multi-Core Clusters

Hierarchical Visualization System for High Performance Computing

Real Time Ultrasound Image Sequence Segmentation on Multicores

GRID & Cloud Computing

Processing Applications Composed of Web/Grid Services by Distributed Autonomic and Self-Organizing Workflow Engines

LPT Scheduling Algorithms with Unavailability Constraints Under Uncertainties

Parallel Genetic Algorithm Implementation for BOINC

RPC/MPI Hybrid Implementation of OpenFMO – All Electron Calculations of a Ribosome

When Clouds Become Green: The Green Open Cloud Architecture

A Versatile System for Asynchronous Iterations: From Multithreaded Simulations to Grid Experiments

Programming

Exploiting Object-Oriented Abstractions to Parallelize Sparse Linear Algebra Codes

Handling Massive Parallelism Efficiently: Introducing Batches of Threads

Skeletons for Multi/Many-Core Systems

Efficient Streaming Applications on Multi-Core with FastFlow: The Biosequence Alignment Test-Bed

A Framework for Detailed Multiphase Cloud Modeling on HPC Systems

Extending Task Parallelism for Frequent Pattern Mining

GPU & Cell Programming

Exploring the GPU for Enhancing Parallelism on Color and Texture Analysis

Generalized GEMM Kernels on GPGPUs: Experiments and Applications

Comparison of Modular Arithmetic Algorithms on GPUs

Fast Multipole Method on the Cell Broadband Engine: The Near Field Part

The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions

Performance Measurement of Applications with GPU Acceleration Using CUDA

Compilers & Tools

Conflict Analysis for Heap-Based Data Dependence Detection

Adaptive Parallel Matrix Computing Through Compiler and Run-Time Support

Parallel I/O

High-Throughput Parallel-I/O Using SIONlib for Mesoscopic Particle Dynamics Simulations on Massively Parallel Computers

Tracing Performance of MPI-I/O with PVFS2: A Case Study of Optimization

Communication Runtime

A Historic Knowledge Based Approach for Dynamic Optimization

Evaluation of Task Mapping Strategies for Regular Network Topologies

Benchmark & Performance Tuning

Automatic Performance Tuning of Parallel Mathematical Libraries

Automatic Performance Tuning Approach for Parallel Applications Based on Sparse Linear Solvers

A Flexible, Application- and Platform-Independent Environment for Benchmarking

Fault Tolerance

Optimized Checkpointing Protocols for Data Parallel Programs

Constructing Resiliant Communication Infrastructure for Runtime Environments

Industrial Papers

Optimizing Performance and Energy of High Performance Computing Applications

Mini-Symposium "Adaptive Parallel Computing: Latency Toleration, Non-Determinism as a Form of Adaptation, Adaptive Mapping"

An Operational Semantics for S-Net

Mini-Symposium "DEISA: Extreme Computing in an Advanced Supercomputing Environment"

DEISA Mini-Symposium on Extreme Computing in an Advanced Supercomputing Environment

DEISA Extreme Computing Initiative (DECI) and Science Community Support

Application Oriented DEISA Infrastructure Services

Chemical Characterization of Super-Heavy Elements by Relativistic Four-Component DFT

Direct Numerical Simulation of the Turbulent Development of a Round Jet at Reynolds Number 11,000

EUFORIA: Exploring E-Science for Fusion

Mini-Symposium "EuroGPU 2009"

Parallel Computing with GPUs

Porous Rock Simulations and Lattice Boltzmann on GPUs

An Efficient Multi-Algorithms Sparse Linear Solver for GPUs

Abstraction of Programming Models Across Multi-Core and GPGPU Architectures

Modelling Multi-GPU Systems

Throughput Computing on Future GPUs

Mini-Symposium "ParaFPGA-2009: Parallel Computing with FPGA’s"

ParaFPGA: Parallel Computing with Flexible Hardware

Software vs. Hardware Message Passing Implementations for FPGA Clusters

RAPTOR – A Scalable Platform for Rapid Prototyping and FPGA-Based Cluster Computing

Speeding up Combinational Synthesis in an FPGA Cluster

A Highly Parallel FPGA-Based Evolvable Hardware Architecture

Applying Parameterizable Dynamic Configurations to Sequence Alignment

Towards a More Efficient Run-Time FPGA Configuration Generation

ACCFS – Virtual File System Support for Host Coupled Run-Time Reconfigurable FPGAs

Mini-Symposium "Parallel Programming Tools for Multi-Core Architectures"

Parallel Programming Tools for Multi-Core Architectures

Parallel Programming for Multi-Core Architectures

An Approach to Application Performance Tuning

How to Accelerate an Application: A Practical Case Study in Combustion Modelling

From OpenMP to MPI: First Experiments of the STEP Source-to-Source Transformation Tool

Using Multi-Core Architectures to Execute High Performance-Oriented Real-Time Applications

Performance Tool Integration in a GPU Programming Environment: Experiences with TAU and HMPP

An Interface for Integrated MPI Correctness Checking

Enhanced Performance Analysis of Multi-Core Applications with an Integrated Tool-Chain – Using Scalasca and Vampir to Optimise the Metal Forming Simulation FE Software INDEED

Mini-Symposium "Programming Heterogeneous Architectures"

Mini-Symposium on Programming Heterogeneous Architectures

Parallelization Exploration of Wireless Applications Using MPA

Prototyping and Programming Tightly Coupled Accelerators

Simplifying Heterogeneous Embedded Systems Programming Based on OpenMP

Author Index

The users who browse this book also browse