Embedded Computing for High Performance ：Efficient Mapping of Computations Using Customization, Code Transformations and Compilation

Publication subTitle ：Efficient Mapping of Computations Using Customization, Code Transformations and Compilation

Author： Cardoso João Manuel Paiva;Coutinho José Gabriel de Figueiredo;Diniz Pedro C.

Publisher： Elsevier Science‎

Publication year： 2017

E-ISBN: 9780128041994

P-ISBN(Paperback): 9780128041895

Subject： TP302.1 overall design, system design

Keyword：计算机软件,算法理论,计算技术、计算机技术,自动化技术及设备

Language： ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Description

Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, the book offers a comprehensive description of techniques for mapping computations expressed in programming languages such as C or MATLAB to high-performance embedded architectures consisting of multiple CPUs, GPUs, and reconfigurable hardware (FPGAs).

The authors demonstrate a domain-specific language (LARA) that facilitates retargeting to multiple computing systems using the same source code. In this way, users can decouple original application code from transformed code and enhance productivity and program portability.

After reading this book, engineers will understand the processes, methodologies, and best practices needed for the development of applications for high-performance embedded computing systems.

Focuses on maximizing performance while managing energy consumption in embedded systems
Explains how to retarget code for heterogeneous systems with GPUs and FPGAs
Demonstrates a domain-specific language that facilitates migrating and retargeting existing applications to modern systems
Includes downloadable slides, tools, and tutorials

Chapter

Front Cover

Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, CodeTransformations and Com...

Acknowledgments

Abbreviations

Chapter 1: Introduction

1.1. Overview

1.2. Embedded Systems in Society and Industry

1.3. Embedded Computing Trends

1.4. Embedded Systems: Prototyping and Production

1.5. About LARA: An Aspect-Oriented Approach

1.6. Objectives and Target Audience

1.7. Complementary Bibliography

1.8. Dependences in Terms of Knowledge

1.9. Examples and Benchmarks

1.10. Book Organization

1.11. Intended Use

1.12. Summary

References

Chapter 2: High-performance embedded computing

2.1. Introduction

2.2. Target Architectures

2.2.1. Hardware Accelerators as Coprocessors

2.2.2. Multiprocessor and Multicore Architectures

2.2.3. Heterogeneous Multiprocessor/Multicore Architectures

2.2.4. OpenCL Platform Model

2.3. Core-Based Architectural Enhancements

2.3.1. Single Instruction, Multiple Data Units

2.3.2. Fused Multiply-Add Units

2.3.3. Multithreading Support

2.4. Common Hardware Accelerators

2.4.1. GPU Accelerators

2.4.2. Reconfigurable Hardware Accelerators

2.4.3. SoCs With Reconfigurable Hardware

2.5. Performance

2.5.1. Amdahl's Law

2.5.2. The Roofline Model

2.5.3. Worst-Case Execution Time Analysis

2.6. Power and Energy Consumption

2.6.1. Dynamic Power Management

2.6.2. Dynamic Voltage and Frequency Scaling

2.6.3. Dark Silicon

2.7. Comparing Results

2.8. Summary

2.9. Further Reading

References

Chapter 3: Controlling the design and development cycle

3.1. Introduction

3.2. Specifications in MATLAB and C: Prototyping and Development

3.2.1. Abstraction Levels

3.2.2. Dealing With Different Concerns

3.2.3. Dealing With Generic Code

3.2.4. Dealing With Multiple Targets

3.3. Translation, Compilation, and Synthesis Design flows

3.4. Hardware/Software Partitioning

3.4.1. Static Partitioning

3.4.2. Dynamic Partitioning

3.5. LARA: a language for Specifying Strategies

3.5.1. Select and Apply

3.5.2. Insert Action

3.5.3. Exec and Def Actions

3.5.4. Invoking Aspects

3.5.5. Executing External Tools

3.5.6. Compilation and Synthesis Strategies in LARA

3.6. Summary

3.7. Further Reading

References

Chapter 4: Source code analysis and instrumentation

4.1. Introduction

4.2. Analysis and Metrics

4.3. Static Source Code Analysis

4.3.1. Data Dependences

4.3.2. Code Metrics

4.4. Dynamic Analysis: The Need for Instrumentation

4.4.1. Information From Profiling

4.4.2. Profiling Example

4.5. Custom Profiling Examples

4.5.1. Finding Hotspots

4.5.2. Loop Metrics

4.5.3. Dynamic Call Graphs

4.5.4. Branch Frequencies

4.5.5. Heap Memory

4.6. Summary

4.7. Further Reading

References

Chapter 5: Source code transformations and optimizations

5.1. Introduction

5.2. Basic Transformations

5.3. Data Type Conversions

5.4. Code Reordering

5.5. Data Reuse

5.6. Loop-Based Transformations

5.6.1. Loop Alignment

5.6.2. Loop Coalescing

5.6.3. Loop Flattening

5.6.4. Loop Fusion and Loop Fission

5.6.5. Loop Interchange and Loop Permutation (Loop Reordering)

5.6.6. Loop Peeling

5.6.7. Loop Shifting

5.6.8. Loop Skewing

5.6.9. Loop Splitting

5.6.10. Loop Stripmining

5.6.11. Loop Tiling (Loop Blocking)

5.6.12. Loop Unrolling

5.6.13. Unroll and Jam

5.6.14. Loop Unswitching

5.6.15. Loop Versioning

5.6.16. Software Pipelining

5.6.17. Evaluator-Executor Transformation

5.6.18. Loop Perforation

5.6.19. Other Loop Transformations

5.6.20. Overview

5.7. Function-Based Transformations

5.7.1. Function Inlining/Outlining

5.7.2. Partial Evaluation and Code Specialization

5.7.3. Function Approximation

5.8. Data structure-Based Transformations

5.8.1. Scalar Expansion, Array Contraction, and Array Scalarization

5.8.2. Scalar and Array Renaming

5.8.3. Arrays and Records

5.8.4. Reducing the Number of Dimensions of Arrays

5.8.5. From Arrays to Pointers and Array Recovery

5.8.6. Array Padding

5.8.7. Representation of Matrices and Graphs

5.8.8. Object Inlining

5.8.9. Data Layout Transformations

5.8.10. Data Replication and Data Distribution

5.9. From Recursion to Iterations

5.10. From Nonstreaming to Streaming

5.11. Data and Computation Partitioning

5.11.1. Data Partitioning

5.11.2. Partitioning Computations

5.11.3. Computation Offloading

5.12. LARA Strategies

5.13. Summary

5.14. Further Reading

References

Chapter 6: Code retargeting for CPU-based platforms

6.1. Introduction

6.2. Retargeting Mechanisms

6.3. Parallelism and Compiler Options

6.3.1. Parallel Execution Opportunities

6.3.2. Compiler Options

6.3.3. Compiler Phase Selection and Ordering

6.4. Loop Vectorization

6.5. Shared Memory (Multicore)

6.6. Distributed Memory (Multiprocessor)

6.7. Cache-based Program Optimizations

6.8. LARA Strategies

6.8.1. Capturing Heuristics to Control Code Transformations

6.8.2. Parallelizing Code With OpenMP

6.8.3. Monitoring an MPI Application

6.9. Summary

6.10. Further Reading

References

Chapter 7: Targeting heterogeneous computing platforms

7.1. Introduction

7.2. Roofline Model Revisited

7.3. Workload Distribution

7.4. Graphics Processing Units

7.5. High-level Synthesis

7.6. LARA Strategies

7.7. Summary

7.8. Further Reading

References

Chapter 8: Additional topics

8.1. Introduction

8.2. Design Space Exploration

8.2.1. Single-Objective Optimization and Single/Multiple Criteria

8.2.2. Multiobjective Optimization, Pareto Optimal Solutions

8.2.3. DSE Automation

8.3. Hardware/Software Codesign

8.4. Runtime Adaptability

8.4.1. Tuning Application Parameters

8.4.2. Adaptive Algorithms

8.4.3. Resource Adaptivity

8.5. Automatic Tuning (Autotuning)

8.5.1. Search Space

8.5.2. Static and Dynamic Autotuning

8.5.3. Models for Autotuning

8.5.4. Autotuning Without Dynamic Compilation

8.5.5. Autotuning With Dynamic Compilation

8.6. Using LARA for Exploration of Code Transformation Strategies

8.7. Summary

8.8. Further Reading

References

Glossary

Index

Back Cover