A Designer's Guide to Asynchronous VLSI

Author: Peter A. Beerel; Recep O. Ozdag; Marcos Ferretti  

Publisher: Cambridge University Press‎

Publication year: 2010

E-ISBN: 9780511669392

P-ISBN(Paperback): 9780521872447

Subject: TN47 large scale integrated circuit (VLSIC)

Keyword: 一般工业技术

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

A Designer's Guide to Asynchronous VLSI

Description

Create low power, higher performance circuits with shorter design times using this practical guide to asynchronous design. This practical alternative to conventional synchronous design enables performance close to full-custom designs with design times that approach commercially available ASIC standard cell flows. It includes design trade-offs, specific design examples, and end-of-chapter exercises. Emphasis throughout is placed on practical techniques and real-world applications, making this ideal for circuit design students interested in alternative design styles and system-on-chip circuits, as well as circuit designers in industry who need new solutions to old problems.

Chapter

2.1.5 Pull channels

2.1.6 Abstract channel diagrams

2.2 Sequencing and concurrency

2.2.1 Enclosed handshaking

SEQ module

PAR module

Transferer

2.2.2 Pipelined handshaking

Full buffers versus half buffers

Non-linear pipelines

2.3 Asynchronous memories and holding state

2.4 Arbiters

2.4.1 Non-pipelined arbiters

2.4.2 Pipelined arbiters

2.5 Design examples

2.5.1 Two-place FIFO

The 2 x 2 asynchronous crossbar

2.6 Exercises

References

3: Modeling channel-based designs

3.1 Communicating sequential processes

3.2 Using asynchronous-specific languages

3.3 Using software programming languages

3.4 Using existing hardware design languages

3.5 Modeling channel communication in Verilog

3.5.1 Using send and receive macros

3.5.2 Using synchronization channels and probes

3.5.3 Using enclosed handshaking macros

3.5.4 Modeling the dining-philosophers problem in VerilogCSP

3.5.5 Modeling a 2 x 2 asynchronous crossbar in VerilogCSP

3.6 Implementing VerilogCSP macros

3.6.1 Send and receive macros

3.6.2 Synchronization macros

3.6.3 Probe macros

3.6.4 Enclosed handshaking macros

3.7 Debugging in VerilogCSP

3.7.1 When does deadlock happen?

3.7.2 Monitoring the state of ports and channels

3.8 Summary of VerilogCSP macros

3.9 Exercises

References

4: Pipeline performance

4.1 Block metrics

4.1.1 Forward latency

4.1.2 Local cycle time

4.1.3 Backward latency

4.2 Linear pipelines

4.2.1 Homogeneous linear pipelines

4.2.2 Series composition of linear pipelines

4.2.3 Improving throughput

4.3 Pipeline loops

4.3.1 Design example: implementation of Euclid's algorithm

4.3.2 Performance analysis of rings

4.3.3 Improving ring throughput

4.4 Forks and joins

4.5 More complex pipelines

4.6 Exercises

References

5: Performance analysis and optimization

5.1 Petri nets

5.1.1 Petri net types

5.1.2 Reachability graph

5.1.3 Modeling delays in Petri nets

5.1.4 Cycle time

5.2 Modeling pipelines using channel nets

5.2.1 Full-buffer channel nets

5.2.2 Cycle time and throughput

5.3 Performance analysis

5.3.1 Shortest-path-based algorithm

5.3.2 Linear-programming-based approaches

5.3.3 Relation to maximum-cycle-mean problem

5.3.4 Karp's algorithm

5.4 Performance optimization

5.4.1 Slack matching: an intuitive analysis

5.4.2 Slack matching: an MILP optimization framework

5.5 Advanced topic: stochastic performance analysis

5.6 Exercises

References

6: Deadlock

6.1 Deadlock caused by incorrect circuit design

6.2 Deadlock caused by architectural token mismatch

6.2.1 Data token starvation

6.2.2 Bubble starvation

6.3 Deadlock caused by arbitration

6.3.1 Arbiter deadlock Example 1

6.3.2 Arbiter deadlock Example 2

Reference

7: A taxonomy of design styles

7.1 Delay models

7.1.1 Delay-insensitive design

7.1.2 Quasi-delay-insensitive design

7.1.3 Speed-independent design

7.1.4 Scalable delay-insensitive design

7.1.5 Bounded-delay design

7.2 Timing constraints

7.3 Input-output mode versus fundamental mode

7.4 Logic styles

7.4.1 Static logic

7.4.2 Dynamic logic

7.4.3 Muller C-element implementations

7.4.4 Asymmetric C-elements

7.5 Datapath design

7.5.1 Bundled data

7.5.2 Quasi-delay-insensitive design

7.5.3 Hybrid techniques

7.5.4 Indictability

7.6 Design flows: an overview of approaches

7.6.1 Communicating sequential process language refinement

7.6.2 Syntax-driven translation

7.6.3 Gate-level netlist translation

7.6.4 High-level synthesis-based approaches

7.7 Exercises

References

8: Synthesis-based controller design

8.1 Fundamental-mode Huffman circuits

8.1.1 Burst-mode design

8.1.2 Burst-mode circuits

8.1.3 Burst-mode specification

8.1.4 Hazards

Static hazards

Dynamic hazards

8.1.5 Burst-mode design example

8.2 STG-based design

8.2.1 STG example

8.2.2 CAD tools for STG-based controller design

8.3 Exercises

References

9: Micropipeline design

9.1 Two-phase micropipelines

9.1.1 Non-linear pipelines

9.1.2 Resource sharing

9.1.3 Arbitration

9.1.4 Event-module implementations

9.2 Four-phase micropipelines

9.3 True-four-phase pipelines

9.4 Delay line design

9.4.1 Asymmetric delay line templates

9.4.2 Symmetric delay line templates

9.4.3 Power-efficient asymmetric delay line

9.5 Other micropipeline techniques

9.6 Exercises

References

10: Syntax-directed translation

10.1 Tangram

10.1.1 Data types

10.1.2 Primitive commands

10.1.3 Composite commands

10.2 Handshake components

10.3 Translation algorithm

10.4 Control component implementation

10.5 Datapath component implementations

10.5.1 QDI implementations

10.5.2 Single-rail implementations

10.6 Peephole optimizations

10.7 Self-initialization

10.8 Testability

10.9 Design examples

10.9.1 Tangram digital compact cassette error corrector

10.9.2 Balsa SAMIPS - an asynchronous MIPS R3000 processor

10.9.3 Balsa SPA - an asynchronous ARM V5T processor

10.9.4 Haste ARM996HS: a commercially available asynchronous ARM core

10.10 Summary

10.11 Exercises

References

11: Quasi-delay-insensitive pipeline templates

11.1 Weak-conditioned half buffer

11.2 Precharged half buffer

11.2.1 PCHB full adder

11.2.2 Conditional reading and writing

11.2.3 PCHB reset

11.2.4 PCHB register

11.3 Precharged full buffer

11.4 Why input-completion sensing?

LCD-RCD merging

11.5 Reduced-stack precharged half buffer (RSPCHB)

11.5.1 Conditional reading and writing RSPCHB

11.5.2 RSPCHB register

11.5.3 Loops using RSPCHB

11.6 Reduced-stack precharged full buffer (RSPCFB)

11.7 Quantitative comparisons

11.8 Token insertion

11.9 Arbiter

11.10 Exercises

References

12: Timed pipeline templates

12.1 Williams' PS0 pipeline

12.2 Lookahead pipelines overview

12.3 Dual-rail lookahead pipelines

12.3.1 The LP3/1 pipeline

12.3.2 The LP2/2 pipeline

12.3.3 The LP2/1 pipeline

12.4 Single-rail lookahead pipelines

12.4.1 The LPSR2/2 pipeline

12.4.2 The LPSR2/1 pipeline

12.5 High-capacity pipelines (single-rail)

12.6 Designing non-linear pipeline structures

12.6.1 Slow and stalled right-hand environments in forks

12.6.2 Slow and stalled left-hand environments in joins

12.7 Lookahead pipelines (single-rail)

12.7.1 Solution 1 for LPSR2/2

12.7.2 Solution 2 for LPSR2/2

12.7.3 Pipeline cycle time

12.8 Lookahead pipelines (dual-rail)

12.8.1 Joins

12.8.2 Forks

12.9 High-capacity pipelines (single-rail)

12.9.1 Handling forks and joins

12.9.2 Pipeline cycle time

12.10 Conditionals

12.11 Loops

12.12 Simulation results

12.13 Summary

References

13: Single-track pipeline templates

13.1. Introduction

13.2 GasP bundled data

13.3 Pulsed logic

13.4 Single-track full-buffer template

13.4.1 Static single-track full-buffer (SSTFB) template

13.4.2 The 10-transition template

13.5 STFB pipeline stages

13.5.1 STFB buffer

13.5.2 STFB fork

13.5.3 STFB join

13.5.4 STFB merge

13.5.5 STFB split

13.5.6 STFB arbiter

13.5.7 STFB kpg adder

13.5.8 Shared channels

13.5.9 Bit generators and buckets

13.5.10 Token insertion

13.6 STFB standard-cell implementation

13.6.1 Transistor-sizing strategy

13.6.2 Output sub-cell STFB_POUT

13.6.3 The RCD sizing

13.6.4 Input channel reset transistors

13.6.5 Direct-path current analysis

13.6.6 Performance analysis

13.7 Back-end design flow and library development

13.8 The evaluation and demonstration chip

13.8.1 The prefix adder

13.8.2 The input circuitry

13.8.3 The output circuitry

13.8.4 The chip implementation

13.8.5 Power distribution and electromigration

13.8.6 Comparisons

13.8.7 Demonstration-chip implementation and test

13.8.8 Test results

13.9 Conclusions and open questions

13.9.1 N-stack height limit

13.9.2 Electron migration effect

13.10 Exercises

References

14: Asynchronous crossbar

14.1 Fulcrum's Nexus asynchronous crossbar

14.1.1 The crossbar

14.1.2 Input control

14.1.3 Output control

14.2 Clock domain converter

14.2.1 Synchronization control circuit

14.2.2 Clock domain converter datapath

14.2.3 Latency

14.2.4 Noise analysis

14.2.5 Characterization results

References

15: Design example: the Fano algorithm

15.1 The Fano algorithm

15.1.1 Background of the algorithm

15.1.2 The synchronous design

Normalization and its benefits

Register-transfer-level design

Synchronous chip implementation

15.2 The asynchronous Fano algorithm

15.2.1 The asynchronous Fano architecture

15.2.2 The skip-ahead unit

15.2.3 The memory design

15.2.4 The fast data and decision registers

15.2.5 Simulation results and comparison

15.3 An asynchronous semi-custom physical design flow

15.3.1 Physical design flow using standard CAD tools

References

Index

The users who browse this book also browse


No browse record.