Chapter
Examining the Various Hadoop Offerings
Chapter 2: Common Use Cases for Big Data in Hadoop
The Keys to Successfully Adopting Hadoop (Or, “Please, Can We Keep Him?”)
Data Warehouse Modernization
Social Sentiment Analysis
Chapter 3: Setting Up Your Hadoop Environment
Choosing a Hadoop Distribution
Choosing a Hadoop Cluster Architecture
The Hadoop For Dummies Environment
Your First Hadoop Program: Hello Hadoop!
Part II: How Hadoop Works
Chapter 4: Storing Data in Hadoop: The Hadoop Distributed File System
Sketching Out the HDFS Architecture
Chapter 5: Reading and Writing Data
Managing Files with the Hadoop File System Commands
Ingesting Log Data with Flume
Chapter 6: MapReduce Programming
Seeing the Importance of MapReduce
Doing Things in Parallel: Breaking Big Problems into Many Bite-Size Pieces
Writing MapReduce Applications
Getting Your Feet Wet: Writing a Simple MapReduce Application
Chapter 7: Frameworks for Processing Data in Hadoop: YARN and MapReduce
Running Applications Before Hadoop 2
Seeing a World beyond MapReduce
Real-Time and Streaming Applications
Chapter 8: Pig: Hadoop Programming Made Easier
Admiring the Pig Architecture
Going with the Pig Latin Application Flow
Working through the ABCs of Pig Latin
Evaluating Local and Distributed Modes of Running Pig scripts
Checking Out the Pig Script Interfaces
Chapter 9: Statistical Analysis in Hadoop
Pumping Up Your Statistical Analysis
Machine Learning with Mahout
Chapter 10: Developing and Scheduling Application Workflows with Oozie
Developing and Running an Oozie Workflow
Scheduling and Coordinating Oozie Workflows
Part III: Hadoop and Structured Data
Chapter 11: Hadoop and the Data Warehouse: Friends or Foes?
Comparing and Contrasting Hadoop with Relational Databases
Modernizing the Warehouse with Hadoop
Chapter 12: Extremely Big Tables: Storing Data in HBase
Understanding the HBase Data Model
Understanding the HBase Architecture
Taking HBase for a Test Run
Getting Things Done with HBase
HBase and the RDBMS world
Deploying and Tuning HBase
Chapter 13: Applying Structure to Hadoop Data with Hive
Seeing How the Hive is Put Together
Getting Started with Apache Hive
Examining the Hive Clients
Working with Hive Data Types
Creating and Managing Databases and Tables
Seeing How the Hive Data Manipulation Language Works
Querying and Analyzing Data
Chapter 14: Integrating Hadoop with Relational Databases Using Sqoop
The Principles of Sqoop Design
Scooping Up Data with Sqoop
Sending Data Elsewhere with Sqoop
Looking at Your Sqoop Input and Output Formatting Options
Chapter 15: The Holy Grail: Native SQL Access to Hadoop Data
SQL’s Importance for Hadoop
Looking at What SQL Access Actually Means
SQL Access and Apache Hive
Solutions Inspired by Google Dremel
The SQL Access Big Picture
Part IV: Administering and Configuring Hadoop
Chapter 16: Deploying Hadoop
Working with Hadoop Cluster Components
Hadoop Cluster Configurations
Alternate Deployment Form Factors
Sizing Your Hadoop Cluster
Chapter 17: Administering Your Hadoop Cluster
Achieving Balance: A Big Factor in Cluster Health
Mastering the Hadoop Administration Commands
Understanding Factors for Performance
Tolerating Faults and Data Reliability
Putting Apache Hadoop’s Capacity Scheduler to Good Use
Setting Security: The Kerberos Protocol
Expanding Your Toolset Options
Basic Hadoop Configuration Details
Chapter 18: Ten Hadoop Resources Worthy of a Bookmark
Central Nervous System: Apache.org
planet Big Data Blog Aggregator
Quora’s Apache Hadoop Forum
Conferences Not to Be Missed
The Google Papers That Started It All
The Bonus Resource: What Did We Ever Do B.G.?
Chapter 19: Ten Reasons to Adopt Hadoop
Hadoop Is Relatively Inexpensive
Hadoop Has an Active Open Source Community
Hadoop Is Being Widely Adopted in Every Industry
Hadoop Can Easily Scale Out As Your Data Grows
Traditional Tools Are Integrating with Hadoop
Hadoop Can Store Data in Any Format
Hadoop Is Designed to Run Complex Analytics
Hadoop Can Process a Full Data Set (As Opposed to Sampling)
Hardware Is Being Optimized for Hadoop
Hadoop Can Increasingly Handle Flexible Workloads (No Longer Just Batch)