Description
A practical guide to help you tackle different real-time data processing and analytics problems using the best tools for each scenario
About This Book
• Learn about the various challenges in real-time data processing and use the right tools to overcome them
• This book covers popular tools and frameworks such as Spark, Flink, and Apache Storm to solve all your distributed processing problems
• A practical guide filled with examples, tips, and tricks to help you perform efficient Big Data processing in real-time
Who This Book Is For
If you are a Java developer who would like to be equipped with all the tools required to devise an end-to-end practical solution on real-time data streaming, then this book is for you. Basic knowledge of real-time processing would be helpful, and knowing the fundamentals of Maven, Shell, and Eclipse would be great.
What You Will Learn
• Get an introduction to the established real-time stack
• Understand the key integration of all the components
• Get a thorough understanding of the basic building blocks for real-time solution designing
• Garnish the search and visualization aspects for your real-time solution
• Get conceptually and practically acquainted with real-time analytics
• Be well equipped to apply the knowledge and create your own solutions
In Detail
With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input,
Chapter
Chapter 1: Introducing Real-Time Analytics
Real–time analytics – the myth and the reality
Near real–time solution – an architecture that works
Lambda architecture – analytics possibilities
IOT – thoughts and possibilities
Cloud – considerations for NRT and IOT
Chapter 2: Real Time Applications — The Basic Ingredients
The NRT system and its building blocks
Analytical layer – serve it to the end user
NRT – high-level system view
Transformation and processing
Chapter 3: Understanding and Tailing Data Streams
Understanding data streams
Setting up infrastructure for data ingestion
Taping data from source to the processor - expectations and caveats
Comparing and choosing what works best for your use case
Chapter 4: Setting up the Infrastructure for Storm
Storm architecture and its components
Setting up and configuring Storm
Real-time processing job on Storm
Chapter 5: Configuring Apache Spark and Flink
Setting up and a quick execution of Spark
Setting up and a quick execution of Flink
Setting up and a quick execution of Apache Beam
MinimalWordCount example walk through
Chapter 6: Integrating Storm with a Data Source
RabbitMQ – messaging that works
RabbitMQ — publish and subscribe
RabbitMQ – integration with Storm
PubNub data stream publisher
String together Storm-RMQ-PubNub sensor data topology
Chapter 7: From Storm to Sink
Setting up and configuring Cassandra
Storm and Cassandra topology
Storm and IMDB integration for dimensional data
Integrating the presentation layer with Storm
Setting up Grafana with the Elasticsearch plugin
Installing the Elasticsearch plugin in Grafana
Adding the Elasticsearch datasource in Grafana
Visualizing the output on Grafana
State retention and the need for Trident
Opaque transactional Spout
Basic Storm Trident topology
Chapter 9: Working with Spark
Spark framework and schedulers
Distinct advantages of Spark
When to avoid using Spark
Spark architecture - working inside the engine
RDD – the name says it all
Spark 2.x – advent of data frames and datasets
Chapter 10: Working with Spark Operations
Spark – packaging and API
RDD pragmatic exploration
Shared variables – broadcast variables and accumulators
Chapter 11: Spark Streaming
Spark Streaming - introduction and architecture
Packaging structure of Spark Streaming
Spark Streaming operations
Connecting Kafka to Spark Streaming
Chapter 12: Working with Apache Flink
Flink architecture and execution engine
Flink basic components and processes
Integration of source stream to Flink
Integration with Apache Kafka
Integration with RabbitMQ
Flink processing and computation
Integration with Cassandra
Setting up the infrastructure
Implementing the case study
Building the data simulator
Check distance and alert bolt
Generate Vehicle static value
Visualization using Kibana