Chapter
Introduction to Spark Distributed Processing
Introduction to Spark and Resilient Distributed Datasets
Resilient Distributed Datasets
Python Shell and SparkContext
RDD Creation from External Data Sources
Exercise 1: Basic Interactive Analysis with Python
Operations Supported by the RDD API
Working with Key-Value Pairs
Exercise 2: Map Reduce Operations
Activity 1: Statistical Operations on Books
Self-Contained Python Spark Programs
Introduction to Functional Programming
Exercise 3: Standalone Python Programs
Introduction to SQL, Datasets, and DataFrames
Exercise 4: Downloading the Reduced Version of the movielens Dataset
Exercise 5: RDD Operations in DataFrame Objects
Introduction to Spark Streaming
Introduction to Streaming Architectures
Back-Pressure, Write-Ahead Logging, and Checkpointing
Introduction to Discretized Streams
Consuming Streams from a TCP Socket
Map-Reduce Operations over DStreams
Exercise 6: Building an Event TCP Server
Activity 2: Building a Simple TCP Spark Stream Consumer
Parallel Recovery of State with Checkpointing
Keeping the State in Streaming Applications
Exercise 7: TCP Stream Consumer from Multiple Sources
Activity 3: Consuming Event Data from Three TCP Servers
Exercise 8: Distributed Log Server
Introduction to Structured Streaming
Result Table and Output Modes in Structured Streaming
Exercise 9: Writing Random Ratings
Exercise 10: Structured Streaming
Spark Streaming Integration with AWS
Spark Integration with AWS Services
AWS Kinesis Data Streams Basic Functionality
Integrating AWS Kinesis and Python
Exercise 11: Listing Existing Streams
Exercise 12: Creating a New Stream
Exercise 13: Deleting an Existing Stream
Exercise 14: Pushing Data to a Stream
AWS S3 Basic Functionality
Creating, Listing, and Deleting AWS S3 Buckets
Exercise 15: Listing Existing Buckets
Exercise 16: Creating a Bucket
Exercise 17: Deleting a Bucket
Kinesis Streams and Spark Streams
Activity 4: AWS and Spark Pipeline
Spark Streaming, ML, and Windowing Operations
Spark Integration with Machine Learning
Introduction to Recommendation Systems and Collaborative Filtering
Exercise 18: Collaborative Filtering and Spark
Exercise 19: Creating a TCP Server that Publishes User Ratings
Exercise 20: Spark Streams Integration with Machine Learning
Activity 5: Experimenting with Windowing Operations