Chapter
Parallel Programming Models
XMP-IO Function and Its Application to MapReduce on the K Computer
POLCA - A Programming Model for Large Scale, Strongly Heterogeneous Infrastructures
Exploitation of Quality/Throughput Tradeoffs in Image Processing Through Invasive Computing
An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors
A Scalable Farm Skeleton for Heterogeneous Parallel Programming
Towards Truly Boolean Arrays in Data-Parallel Array Processing
Deep Packet Inspection on Commodity Hardware Using FastFlow
Performance Analysis and Tools
Formalizing Bottlenecks in Task-Based OpenMP Applications
Characterizing Performance of Applications on Blue Gene/Q
Specification of Periscope Tuning Framework Plugins
Parallel Numerical Linear Algebra
On Using Speculative Computations for Parallel Reduction to Tridiagonal Form
Fast Approximate Solution of the Non-Symmetric Generalized Eigenvalue Problem on Multicore Architectures
Locality Optimization on a NUMA Architecture for Hybrid LU Factorization
Variable Block Algebraic Recursive Multilevel Solver (VBARMS) for Sparse Linear Systems
A Proposal of a Single-Synchronized Solver Suited to Large Scale Linear Systems on Parallel Computers with Distributed Memory
Approximate Inverse Preconditioners for Krylov Methods on Heterogeneous Parallel Computers
Cache and Energy Efficiency of Sparse Matrix-Vector Multiplication for Different BLAS Numerical Types with the RSB Format
Heterogeneous Sparse Matrix Computations on Hybrid GPU/CPU Platforms
MapReduce Streaming Algorithms for Laplace Relaxation on the Cloud
Space Exploration Using Parallel Orbits: A Study in Parallel Symbolic Computing
SFC-Based Communication Metadata Encoding for Adaptive Mesh Refinement
Graph Repartitioning with Both Dynamic Load and Dynamic Processor Allocation
ForestClaw: Hybrid Forest-of-Octrees AMR for Hyperbolic Conservation Laws
A Space-Time Parallel Solver for the Three-Dimensional Heat Equation
An Efficient Pipelined Implementation of Space-Time Parallel Applications
GPU Computing and Applications
Efficient GPU-Based Optimization of Volume Meshes
Fast Uniform Grid Construction on GPGPUs Using Atomic Operations
Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX
Numerical Simulation of the Low Compressible Viscous Gas Flows on GPU-Based Hybrid Supercomputers
Simulation of Multiphase Flows in the Subsurface on GPU-Based Supercomputers
Atomic Computing - A Different Perspective on Massively Parallel Problems
Parallelisation and Optimisation of Large-Scale Applications
Accelerating SeisSol by Generating Vectorized Code for Sparse Matrix Operators
Experience with the MPI/STARSS Programming Model on a Large Production Code
Exploiting Data- and Task-Parallelism in the Solution of Riccati Equations on Multicore Servers and GPUs
Testing and Implementing Some New Algorithms Using the FFTW Library on Massively Parallel Supercomputers
Performance Measurements of MHD Simulation for Planetary Magnetosphere on Peta-Scale Computer FX10
Parallel Simulations of Self-Propelled Microorganisms
Improving Communication Performance of Sparse Linear Algebra for an Atomistic Simulation Application
NEMORB's Fourier Filter and Distributed Matrix Transposition on Petaflop Systems
Parallel Computing Design for Exact Diagonalization Scheme on Multi-Band Hubbard Cluster Models
Numerical Experiments with New Algorithms for Parallel Decomposition of Large Computational Meshes
A Distributed Algorithm for the Permutation Flow Shop Problem - An Empirical Analysis
GPI2 for GPUs: A PGAS Framework for Efficient Communication in Hybrid Clusters
A Fault Tolerant Implementation of Multi-Level Monte Carlo Methods
High Performance CPU/GPU Multiresolution Poisson Solver
Mini-Symposium "Parallel Computing with FPGAs (ParaFPGA2013)"
ParaFPGA 2013: Harnessing Programs, Power and Performance in Parallel FPGA Applications
High-Level Synthesis Revised: Generation of FPGA Accelerators from a Domain-Specific Language Using the Polyhedron Model
Compiling a Dataflow-Based Language Abstraction onto an FPGA
Timing Driven C-Slow Retiming on RTL for MultiCores on FPGAs
Performance and Resource Modeling for FPGAs Using High-Level Synthesis Tools
Interactive Graph Cuts Using FPGA
An Image Filter System Based on Dynamic Partial Reconfiguration on FPGA
Investigating Energy Consumption of an SRAM-Based FPGA for Duty-Cycle Applications
Mini-Symposium "High-Dimensional Meets Parallel - Algorithms and Applications"
High-Dimensional Meets Parallel: Algorithms and Applications
Global Communication Schemes for the Sparse Grid Combination Technique
Load Balancing for Massively Parallel Computations with the Sparse Grid Combination Technique
A Parallel Fault Tolerant Combination Technique
Managing Complexity in the Parallel Sparse Grid Combination Technique
Scalability and Fault Tolerance of the Alternating Direction Method of Multipliers for Sparse Grids
Mini-Symposium "Application Autotuning for HPC (Architectures)"
Mini-Symposium on Application Autotuning for HPC
Investigating Performance Benefits from OpenACC Kernel Directives
Application-Independent Autotuning for GPUs
Autotuning of Pattern Runtimes for Accelerated Parallel Systems
Empirical Performance Modeling of GPU Kernels Using Active Learning
Crowdtuning: Systematizing Auto-Tuning Using Predictive Modeling and Crowdsourcing
Autotuning the Energy Consumption
Potentials and Limitations for Energy Efficiency Auto-Tuning
Mini-Symposium "Extreme Scaling on SuperMUC"
Extreme Scaling Workshop at the LRZ
Extreme Scaling of Lattice Quantum Chromodynamics
End-to-End Parallel Simulations with APES
Towards Petaflops Capability of the VERTEX Supernova Code
Scaling of the GROMACS 4.6 Molecular Dynamics Code on SuperMUC
Mini-Symposium "Parallel Programming for Heterogeneous Architectures"
Parallel Programming for Heterogeneous Architectures
Execution Schemes for the NPB-MZ Benchmarks on Hybrid Architectures: A Comparative Study
Scilab on a Hybrid Platform
Divide and Conquer Parallelization of Finite Element Method Assembly
Cudagrind: A Valgrind Extension for CUDA
Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware
Binary Instrumentation for Scalable Performance Measurement of OpenMP Applications
A Case Study: Holistic Performance Analysis on Heterogeneous Architectures Using the Vampir Toolchain
Further Mini-Symposium Contributions
PRACE DECI (Distributed European Computing Initiative) Minisymposium
A Generic Prototype to Benchmark Algorithms and Data Structures for Hierarchical Hybrid Grids
Towards a Performance Engineering Workflow for OpenMP 4.0
Theoretical Measures of Cache Efficiency for Tetrahedral Adaptive Meshes. A Case Study with a Quasi Space-Filling Curve Order