Fundamentals of Stream Processing :Application Design, Systems, and Analytics

Publication subTitle :Application Design, Systems, and Analytics

Author: Henrique C. M. Andrade; Buğra Gedik; Deepak S. Turaga  

Publisher: Cambridge University Press‎

Publication year: 2014

E-ISBN: 9781107439245

P-ISBN(Paperback): 9781107015548

Subject: TP39 computer application

Keyword: 计算机的应用

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Fundamentals of Stream Processing

Description

Stream processing is a novel distributed computing paradigm that supports the gathering, processing and analysis of high-volume, heterogeneous, continuous data streams, to extract insights and actionable results in real time. This comprehensive, hands-on guide combining the fundamental building blocks and emerging research in stream processing is ideal for application designers, system builders, analytic developers, as well as students and researchers in the field. This book introduces the key components of the stream computing paradigm, including the distributed system infrastructure, the programming model, design patterns and streaming analytics. The explanation of the underlying theoretical principles, illustrative examples and implementations using the IBM InfoSphere Streams SPL language and real-world case studies provide students and practitioners with a comprehensive understanding of such applications and the middleware that supports them.

Chapter

2 Introduction to stream processing

2.1 Overview

2.2 Stream Processing Applications

2.2.1 Network monitoring for cybersecurity

2.2.2 Transportation grid monitoring and optimization

2.2.3 Healthcare and patient monitoring

2.2.4 Discussion

2.3 Information flow processing technologies

2.3.1 Active databases

2.3.2 Continuous queries

2.3.3 Publish–subscribe systems

2.3.4 Complex event processing systems

2.3.5 ETL and SCADA systems

2.4 Stream Processing Systems

2.4.1 Data

2.4.2 Processing

2.4.3 System architecture

2.4.4 Implementations

2.4.5 Discussion

2.5 Concluding remarks

2.6 Exercises

References

Part II Application development

3 Application development – the basics

3.1 Overview

3.2 Characteristics of SPAs

3.3 Stream processing languages

3.3.1 Features of stream processing languages

3.3.2 Approaches to stream processing language design

3.4 Introduction to SPL

3.4.1 Language origins

3.4.2 A ``Hello World'' application in SPL

3.5 Common stream processing operators

3.5.1 Stream relational operators

3.5.2 Utility operators

3.5.3 Edge adapter operators

3.6 Concluding remarks

3.7 Programming exercises

References

4 Application development – data flow programming

4.1 Overview

4.2 Flow composition

4.2.1 Static composition

4.2.2 Dynamic composition

4.2.3 Nested composition

4.3 Flow manipulation

4.3.1 Operator state

4.3.2 Selectivity and arity

4.3.3 Using parameters

4.3.4 Output assignments and output functions

4.3.5 Punctuations

4.3.6 Windowing

4.4 Concluding remarks

4.5 Programming exercises

References

5 Large-scale development – modularity, extensibility, and distribution

5.1 Overview

5.2 Modularity and extensibility

5.2.1 Types

5.2.2 Functions

5.2.3 Primitive operators

5.2.4 Composite and custom operators

5.3 Distributed programming

5.3.1 Logical versus physical flow graphs

5.3.2 Placement

5.3.3 Transport

5.4 Concluding remarks

5.5 Programming exercises

References

6 Visualization and debugging

6.1 Overview

6.2 Visualization

6.2.1 Topology visualization

6.2.2 Metrics visualization

6.2.3 Status visualization

6.2.4 Data visualization

6.3 Debugging

6.3.1 Semantic debugging

6.3.2 User-defined operator debugging

6.3.3 Deployment debugging

6.3.4 Performance debugging

6.4 Concluding remarks

References

Part III System architecture

7 Architecture of a stream processing system

7.1 Overview

7.2 Architectural building blocks

7.2.1 Computational environment

7.2.2 Entities

7.2.3 Services

7.3 Architecture overview

7.3.1 Job management

7.3.2 Resource management

7.3.3 Scheduling

7.3.4 Monitoring

7.3.5 Data transport

7.3.6 Fault tolerance

7.3.7 Logging and error reporting

7.3.8 Security and access control

7.3.9 Debugging

7.3.10 Visualization

7.4 Interaction with the system architecture

7.5 Concluding remarks

References

8 InfoSphere Streams architecture

8.1 Overview

8.2 Background and history

8.3 A user's perspective

8.4 Components

8.4.1 Runtime instance

8.4.2 Instance components

8.4.3 Instance backbone

8.4.4 Tooling

8.5 Services

8.5.1 Job management

8.5.2 Resource management and monitoring

8.5.3 Scheduling

8.5.4 Data transport

8.5.5 Fault tolerance

8.5.6 Logging, tracing, and error reporting

8.5.7 Security and access control

8.5.8 Application development support

8.5.9 Processing element

8.5.10 Debugging

8.5.11 Visualization

8.6 Concluding remarks

References

Part IV Application design and analytics

9 Design principles and patterns for stream processing applications

9.1 Overview

9.2 Functional design patterns and principles

9.2.1 Edge adaptation

9.2.2 Flow manipulation

9.2.3 Dynamic adaptation

9.3 Non-functional principles and design patterns

9.3.1 Application design and composition

9.3.2 Parallelization

9.3.3 Performance optimization

9.3.4 Fault tolerance

9.4 Concluding remarks

References

10 Stream analytics: data pre-processing and transformation

10.1 Overview

10.2 The mining process

10.3 Notation

10.4 Descriptive statistics

10.4.1 Illustrative technique: BasicCounting

10.4.2 Advanced reading

10.5 Sampling

10.5.1 Illustrative technique: reservoir sampling

10.5.2 Advanced reading

10.6 Sketches

10.6.1 Illustrative technique: Count-Min sketch

10.6.2 Advanced reading

10.7 Quantization

10.7.1 Illustrative techniques: binary clipping and moment preserving quantization

10.7.2 Advanced reading

10.8 Dimensionality reduction

10.8.1 Illustrative technique: SPIRIT

10.8.2 Advanced reading

10.9 Transforms

10.9.1 Illustrative technique: the Haar transform

10.9.2 Advanced reading

10.10 Concluding remarks

References

11 Stream analytics: modeling and evaluation

11.1 Overview

11.2 Offline modeling and online evaluation

11.3 Data stream classification

11.3.1 Illustrative technique: VFDT

11.3.2 Advanced reading

11.4 Data stream clustering

11.4.1 Illustrative technique: CluStream microclustering

11.4.2 Advanced reading

11.5 Data stream regression

11.5.1 Illustrative technique: linear regression with SGD

11.5.2 Advanced reading

11.6 Data stream frequent pattern mining

11.6.1 Illustrative technique: lossy counting

11.6.2 Advanced reading

11.7 Anomaly detection

11.7.1 Illustrative technique: micro-clustering-based anomaly detection

11.7.2 Advanced reading

11.8 Concluding remarks

References

Part V Case studies

12 Applications

12.1 Overview

12.2 The Operations Monitoring application

12.2.1 Motivation

12.2.2 Requirements

12.2.3 Design

12.2.4 Analytics

12.2.5 Fault tolerance

12.3 The Patient Monitoring application

12.3.1 Motivation

12.3.2 Requirements

12.3.3 Design

12.3.4 Evaluation

12.4 The Semiconductor Process Control application

12.4.1 Motivation

12.4.2 Requirements

12.4.3 Design

12.4.4 Evaluation

12.4.5 User interface

12.5 Concluding remarks

References

Part VI Closing notes

13 Conclusion

13.1 Book summary

13.2 Challenges and open problems

13.2.1 Software engineering

13.2.2 Integration

13.2.3 Scaling up and distributed computing

13.2.4 Analytics

13.3 Where do we go from here?

References

Keywords and identifiers index

Index

The users who browse this book also browse