Hadoop For Dummies

Author： Dirk deRoos

Publisher： John Wiley & Sons Inc‎

Publication year： 2014

E-ISBN: 9781118705032

P-ISBN(Paperback): 9781118607558

P-ISBN(Hardback): 9781118607558

Subject： F2 Economic Planning and Management;F224-39 computer applications

Language： ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Description

Let Hadoop For Dummies help harness the power of your data and rein in the information overload

Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.

Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications
Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily
Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving
Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster

From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.

Chapter

How This Book Is Organized

Icons Used in This Book

Beyond the Book

Where to Go from Here

Part I: Getting Started with Hadoop

Chapter 1: Introducing Hadoop and Seeing What It’s Good For

Big Data and the Need for Hadoop

The Origin and Design of Hadoop

Examining the Various Hadoop Offerings

Chapter 2: Common Use Cases for Big Data in Hadoop

The Keys to Successfully Adopting Hadoop (Or, “Please, Can We Keep Him?”)

Log Data Analysis

Data Warehouse Modernization

Fraud Detection

Risk Modeling

Social Sentiment Analysis

Image Classification

Graph Analysis

To Infinity and Beyond

Chapter 3: Setting Up Your Hadoop Environment

Choosing a Hadoop Distribution

Choosing a Hadoop Cluster Architecture

The Hadoop For Dummies Environment

Your First Hadoop Program: Hello Hadoop!

Part II: How Hadoop Works

Chapter 4: Storing Data in Hadoop: The Hadoop Distributed File System

Data Storage in HDFS

Sketching Out the HDFS Architecture

HDFS Federation

HDFS High Availability

Chapter 5: Reading and Writing Data

Compressing Data

Managing Files with the Hadoop File System Commands

Ingesting Log Data with Flume

Chapter 6: MapReduce Programming

Thinking in Parallel

Seeing the Importance of MapReduce

Doing Things in Parallel: Breaking Big Problems into Many Bite-Size Pieces

Writing MapReduce Applications

Getting Your Feet Wet: Writing a Simple MapReduce Application

Chapter 7: Frameworks for Processing Data in Hadoop: YARN and MapReduce

Running Applications Before Hadoop 2

Seeing a World beyond MapReduce

Real-Time and Streaming Applications

Chapter 8: Pig: Hadoop Programming Made Easier

Admiring the Pig Architecture

Going with the Pig Latin Application Flow

Working through the ABCs of Pig Latin

Evaluating Local and Distributed Modes of Running Pig scripts

Checking Out the Pig Script Interfaces

Scripting with Pig Latin

Chapter 9: Statistical Analysis in Hadoop

Pumping Up Your Statistical Analysis

Machine Learning with Mahout

R on Hadoop

Chapter 10: Developing and Scheduling Application Workflows with Oozie

Getting Oozie in Place

Developing and Running an Oozie Workflow

Scheduling and Coordinating Oozie Workflows

Part III: Hadoop and Structured Data

Chapter 11: Hadoop and the Data Warehouse: Friends or Foes?

Comparing and Contrasting Hadoop with Relational Databases

Modernizing the Warehouse with Hadoop

Chapter 12: Extremely Big Tables: Storing Data in HBase

Say Hello to HBase

Understanding the HBase Data Model

Understanding the HBase Architecture

Taking HBase for a Test Run

Getting Things Done with HBase

HBase and the RDBMS world

Deploying and Tuning HBase

Chapter 13: Applying Structure to Hadoop Data with Hive

Saying Hello to Hive

Seeing How the Hive is Put Together

Getting Started with Apache Hive

Examining the Hive Clients

Working with Hive Data Types

Creating and Managing Databases and Tables

Seeing How the Hive Data Manipulation Language Works

Querying and Analyzing Data

Chapter 14: Integrating Hadoop with Relational Databases Using Sqoop

The Principles of Sqoop Design

Scooping Up Data with Sqoop

Sending Data Elsewhere with Sqoop

Looking at Your Sqoop Input and Output Formatting Options

Sqoop 2.0 Preview

Chapter 15: The Holy Grail: Native SQL Access to Hadoop Data

SQL’s Importance for Hadoop

Looking at What SQL Access Actually Means

SQL Access and Apache Hive

Solutions Inspired by Google Dremel

IBM Big SQL

Pivotal HAWQ

Hadapt

The SQL Access Big Picture

Part IV: Administering and Configuring Hadoop

Chapter 16: Deploying Hadoop

Working with Hadoop Cluster Components

Hadoop Cluster Configurations

Alternate Deployment Form Factors

Sizing Your Hadoop Cluster

Chapter 17: Administering Your Hadoop Cluster

Achieving Balance: A Big Factor in Cluster Health

Mastering the Hadoop Administration Commands

Understanding Factors for Performance

Tolerating Faults and Data Reliability

Putting Apache Hadoop’s Capacity Scheduler to Good Use

Setting Security: The Kerberos Protocol

Expanding Your Toolset Options

Basic Hadoop Configuration Details

Part V: The Part of Tens

Chapter 18: Ten Hadoop Resources Worthy of a Bookmark

Central Nervous System: Apache.org

Tweet This

Hortonworks University

Cloudera University

BigDataUniversity.com

planet Big Data Blog Aggregator

Quora’s Apache Hadoop Forum

The IBM Big Data Hub

Conferences Not to Be Missed

The Google Papers That Started It All

The Bonus Resource: What Did We Ever Do B.G.?

Chapter 19: Ten Reasons to Adopt Hadoop

Hadoop Is Relatively Inexpensive

Hadoop Has an Active Open Source Community

Hadoop Is Being Widely Adopted in Every Industry

Hadoop Can Easily Scale Out As Your Data Grows

Traditional Tools Are Integrating with Hadoop

Hadoop Can Store Data in Any Format

Hadoop Is Designed to Run Complex Analytics

Hadoop Can Process a Full Data Set (As Opposed to Sampling)

Hardware Is Being Optimized for Hadoop

Hadoop Can Increasingly Handle Flexible Workloads (No Longer Just Batch)

Index

About the Authors

The users who browse this book also browse

Description

Chapter

The users who browse this book also browse

No browse record.