Building Data Streaming Applications with Apache Kafka

Author: Manish Kumar;Chanchal Singh  

Publisher: Packt Publishing‎

Publication year: 2017

E-ISBN: 9781787287631

P-ISBN(Paperback): 9781787283985

Subject: TP312 程序语言、算法语言

Keyword: 程序语言、算法语言,数据处理、数据处理系统

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Building Data Streaming Applications with Apache Kafka

Description

Design and administer fast, reliable enterprise messaging systems with Apache Kafka About This Book • Build efficient real-time streaming applications in Apache Kafka to process data streams of data • Master the core Kafka APIs to set up Apache Kafka clusters and start writing message producers and consumers • A comprehensive guide to help you get a solid grasp of the Apache Kafka concepts in Apache Kafka with pracitcalpractical examples Who This Book Is For If you want to learn how to use Apache Kafka and the different tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this book What You Will Learn • Learn the basics of Apache Kafka from scratch • Use the basic building blocks of a streaming application • Design effective streaming applications with Kafka using Spark, Storm &, and Heron • Understand the importance of a low -latency , high- throughput, and fault-tolerant messaging system • Make effective capacity planning while deploying your Kafka Application • Understand and implement the best security practices In Detail Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur. This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications usin

Chapter

Chapter 1: Introduction to Messaging Systems

Understanding the principles of messaging systems

Understanding messaging systems

Peeking into a point-to-point messaging system

Publish-subscribe messaging system

Advance Queuing Messaging Protocol

Using messaging systems in big data streaming applications

Summary

Chapter 2: Introducing Kafka the Distributed Messaging Platform

Kafka origins

Kafka's architecture

Message topics

Message partitions

Replication and replicated logs

Message producers

Message consumers

Role of Zookeeper

Summary

Chapter 3: Deep Dive into Kafka Producers

Kafka producer internals

Kafka Producer APIs

Producer object and ProducerRecord object

Custom partition

Additional producer configuration

Java Kafka producer example

Common messaging publishing patterns

Best practices

Summary

Chapter 4: Deep Dive into Kafka Consumers

Kafka consumer internals

Understanding the responsibilities of Kafka consumers

Kafka consumer APIs

Consumer configuration

Subscription and polling

Committing and polling

Additional configuration

Java Kafka consumer

Scala Kafka consumer

Rebalance listeners

Common message consuming patterns

Best practices

Summary

Chapter 5: Building Spark Streaming Applications with Kafka

Introduction to Spark 

Spark architecture

Pillars of Spark

The Spark ecosystem

Spark Streaming 

Receiver-based integration

Disadvantages of receiver-based approach

Java example for receiver-based integration

Scala example for receiver-based integration

Direct approach

Java example for direct approach

Scala example for direct approach

Use case log processing - fraud IP detection

Maven

Producer 

Property reader

Producer code 

Fraud IP lookup

Expose hive table

Streaming code

Summary

Chapter 6: Building Storm Applications with Kafka

Introduction to Apache Storm

Storm cluster architecture

The concept of a Storm application

Introduction to Apache Heron

Heron architecture 

Heron topology architecture

Integrating Apache Kafka with Apache Storm - Java

Example

Integrating Apache Kafka with Apache Storm - Scala

Use case – log processing in Storm, Kafka, Hive

Producer

Producer code 

Fraud IP lookup

Running the project

Summary

Chapter 7: Using Kafka with Confluent Platform

Introduction to Confluent Platform

Deep driving into Confluent architecture

Understanding Kafka Connect and Kafka Stream

Kafka Streams

Playing with Avro using Schema Registry

Moving Kafka data to HDFS

Camus 

Running Camus

Gobblin

Gobblin architecture

Kafka Connect

Flume

Summary

Chapter 8: Building ETL Pipelines Using Kafka

Considerations for using Kafka in ETL pipelines

Introducing Kafka Connect

Deep dive into Kafka Connect

Introductory examples of using Kafka Connect

Kafka Connect common use cases

Summary 

Chapter 9: Building Streaming Applications Using Kafka Streams

Introduction to Kafka Streams

Using Kafka in Stream processing

Kafka Stream - lightweight Stream processing library 

Kafka Stream architecture 

Integrated framework advantages

Understanding tables and Streams together

Maven dependency

Kafka Stream word count

KTable

Use case example of Kafka Streams

Maven dependency of Kafka Streams

Property reader

IP record producer

IP lookup service

Fraud detection application

Summary

Chapter 10: Kafka Cluster Deployment

Kafka cluster internals

Role of Zookeeper

Replication

Metadata request processing

Producer request processing

Consumer request processing

Capacity planning

Capacity planning goals

Replication factor

Memory

Hard drives

Network

CPU

Single cluster deployment

Multicluster deployment

Decommissioning brokers

Data migration

Summary

Chapter 11: Using Kafka in Big Data Applications

Managing high volumes in Kafka

Appropriate hardware choices 

Producer read and consumer write choices

Kafka message delivery semantics

At least once delivery 

At most once delivery 

Exactly once delivery 

Big data and Kafka common usage patterns

Kafka and data governance

Alerting and monitoring

Useful Kafka matrices

Producer matrices 

Broker matrices

Consumer metrics

Summary

Chapter 12: Securing Kafka

An overview of securing Kafka

Wire encryption using SSL

Steps to enable SSL in Kafka

Configuring SSL for Kafka Broker

Configuring SSL for Kafka clients

Kerberos SASL for authentication

Steps to enable SASL/GSSAPI - in Kafka

Configuring SASL for Kafka broker

Configuring SASL for Kafka client - producer and consumer

Understanding ACL and authorization

Common ACL operations

List ACLs

Understanding Zookeeper authentication

Apache Ranger for authorization

Adding Kafka Service to Ranger

Adding policies 

Best practices

Summary

Chapter 13: Streaming Application Design Considerations

Latency and throughput

Data and state persistence

Data sources

External data lookups

Data formats

Data serialization

Level of parallelism

Out-of-order events

Message processing semantics

Summary

Index

The users who browse this book also browse


No browse record.