Chapter
2.1.6 Abstract channel diagrams
2.2 Sequencing and concurrency
2.2.1 Enclosed handshaking
2.2.2 Pipelined handshaking
Full buffers versus half buffers
2.3 Asynchronous memories and holding state
2.4.1 Non-pipelined arbiters
The 2 x 2 asynchronous crossbar
3: Modeling channel-based designs
3.1 Communicating sequential processes
3.2 Using asynchronous-specific languages
3.3 Using software programming languages
3.4 Using existing hardware design languages
3.5 Modeling channel communication in Verilog
3.5.1 Using send and receive macros
3.5.2 Using synchronization channels and probes
3.5.3 Using enclosed handshaking macros
3.5.4 Modeling the dining-philosophers problem in VerilogCSP
3.5.5 Modeling a 2 x 2 asynchronous crossbar in VerilogCSP
3.6 Implementing VerilogCSP macros
3.6.1 Send and receive macros
3.6.2 Synchronization macros
3.6.4 Enclosed handshaking macros
3.7 Debugging in VerilogCSP
3.7.1 When does deadlock happen?
3.7.2 Monitoring the state of ports and channels
3.8 Summary of VerilogCSP macros
4.2.1 Homogeneous linear pipelines
4.2.2 Series composition of linear pipelines
4.2.3 Improving throughput
4.3.1 Design example: implementation of Euclid's algorithm
4.3.2 Performance analysis of rings
4.3.3 Improving ring throughput
4.5 More complex pipelines
5: Performance analysis and optimization
5.1.3 Modeling delays in Petri nets
5.2 Modeling pipelines using channel nets
5.2.1 Full-buffer channel nets
5.2.2 Cycle time and throughput
5.3.1 Shortest-path-based algorithm
5.3.2 Linear-programming-based approaches
5.3.3 Relation to maximum-cycle-mean problem
5.4 Performance optimization
5.4.1 Slack matching: an intuitive analysis
5.4.2 Slack matching: an MILP optimization framework
5.5 Advanced topic: stochastic performance analysis
6.1 Deadlock caused by incorrect circuit design
6.2 Deadlock caused by architectural token mismatch
6.2.1 Data token starvation
6.3 Deadlock caused by arbitration
6.3.1 Arbiter deadlock Example 1
6.3.2 Arbiter deadlock Example 2
7: A taxonomy of design styles
7.1.1 Delay-insensitive design
7.1.2 Quasi-delay-insensitive design
7.1.3 Speed-independent design
7.1.4 Scalable delay-insensitive design
7.1.5 Bounded-delay design
7.3 Input-output mode versus fundamental mode
7.4.3 Muller C-element implementations
7.4.4 Asymmetric C-elements
7.5.2 Quasi-delay-insensitive design
7.6 Design flows: an overview of approaches
7.6.1 Communicating sequential process language refinement
7.6.2 Syntax-driven translation
7.6.3 Gate-level netlist translation
7.6.4 High-level synthesis-based approaches
8: Synthesis-based controller design
8.1 Fundamental-mode Huffman circuits
8.1.2 Burst-mode circuits
8.1.3 Burst-mode specification
8.1.5 Burst-mode design example
8.2.2 CAD tools for STG-based controller design
9.1 Two-phase micropipelines
9.1.1 Non-linear pipelines
9.1.4 Event-module implementations
9.2 Four-phase micropipelines
9.3 True-four-phase pipelines
9.4.1 Asymmetric delay line templates
9.4.2 Symmetric delay line templates
9.4.3 Power-efficient asymmetric delay line
9.5 Other micropipeline techniques
10: Syntax-directed translation
10.1.2 Primitive commands
10.1.3 Composite commands
10.2 Handshake components
10.3 Translation algorithm
10.4 Control component implementation
10.5 Datapath component implementations
10.5.1 QDI implementations
10.5.2 Single-rail implementations
10.6 Peephole optimizations
10.9.1 Tangram digital compact cassette error corrector
10.9.2 Balsa SAMIPS - an asynchronous MIPS R3000 processor
10.9.3 Balsa SPA - an asynchronous ARM V5T processor
10.9.4 Haste ARM996HS: a commercially available asynchronous ARM core
11: Quasi-delay-insensitive pipeline templates
11.1 Weak-conditioned half buffer
11.2 Precharged half buffer
11.2.2 Conditional reading and writing
11.3 Precharged full buffer
11.4 Why input-completion sensing?
11.5 Reduced-stack precharged half buffer (RSPCHB)
11.5.1 Conditional reading and writing RSPCHB
11.5.3 Loops using RSPCHB
11.6 Reduced-stack precharged full buffer (RSPCFB)
11.7 Quantitative comparisons
12: Timed pipeline templates
12.1 Williams' PS0 pipeline
12.2 Lookahead pipelines overview
12.3 Dual-rail lookahead pipelines
12.3.1 The LP3/1 pipeline
12.3.2 The LP2/2 pipeline
12.3.3 The LP2/1 pipeline
12.4 Single-rail lookahead pipelines
12.4.1 The LPSR2/2 pipeline
12.4.2 The LPSR2/1 pipeline
12.5 High-capacity pipelines (single-rail)
12.6 Designing non-linear pipeline structures
12.6.1 Slow and stalled right-hand environments in forks
12.6.2 Slow and stalled left-hand environments in joins
12.7 Lookahead pipelines (single-rail)
12.7.1 Solution 1 for LPSR2/2
12.7.2 Solution 2 for LPSR2/2
12.7.3 Pipeline cycle time
12.8 Lookahead pipelines (dual-rail)
12.9 High-capacity pipelines (single-rail)
12.9.1 Handling forks and joins
12.9.2 Pipeline cycle time
13: Single-track pipeline templates
13.4 Single-track full-buffer template
13.4.1 Static single-track full-buffer (SSTFB) template
13.4.2 The 10-transition template
13.5 STFB pipeline stages
13.5.9 Bit generators and buckets
13.6 STFB standard-cell implementation
13.6.1 Transistor-sizing strategy
13.6.2 Output sub-cell STFB_POUT
13.6.4 Input channel reset transistors
13.6.5 Direct-path current analysis
13.6.6 Performance analysis
13.7 Back-end design flow and library development
13.8 The evaluation and demonstration chip
13.8.2 The input circuitry
13.8.3 The output circuitry
13.8.4 The chip implementation
13.8.5 Power distribution and electromigration
13.8.7 Demonstration-chip implementation and test
13.9 Conclusions and open questions
13.9.1 N-stack height limit
13.9.2 Electron migration effect
14: Asynchronous crossbar
14.1 Fulcrum's Nexus asynchronous crossbar
14.2 Clock domain converter
14.2.1 Synchronization control circuit
14.2.2 Clock domain converter datapath
14.2.5 Characterization results
15: Design example: the Fano algorithm
15.1.1 Background of the algorithm
15.1.2 The synchronous design
Normalization and its benefits
Register-transfer-level design
Synchronous chip implementation
15.2 The asynchronous Fano algorithm
15.2.1 The asynchronous Fano architecture
15.2.2 The skip-ahead unit
15.2.4 The fast data and decision registers
15.2.5 Simulation results and comparison
15.3 An asynchronous semi-custom physical design flow
15.3.1 Physical design flow using standard CAD tools