Big Data Architect’s Handbook

Author: Syed Muhammad Fahad Akhtar  

Publisher: Packt Publishing‎

Publication year: 2018

E-ISBN: 9781788836388

P-ISBN(Paperback): 89543100163820

Subject: TP274 数据处理、数据处理系统

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Big Data Architect’s Handbook

Chapter

Chapter 1: Why Big Data?

What is big data?

Characteristics of big data

Volume

Velocity

Variety

Veracity

Variability

Value

Solution-based approach for data

Data – the most valuable asset

Traditional approaches to data storage

Clustered computing

High availability

Resource pooling

Easy scalability

Big data – how does it make a difference?

Big data solutions – cloud versus on-premises infrastructure

Cost

Security

Current capabilities

Scalability

Big data glossary

Big data

Batch processing

Cluster computing

Data warehouse

Data lake

Data mining

ETL

Hadoop

In-memory computing

Machine learning

MapReduce

NoSQL

Stream processing

Summary

Chapter 2: Big Data Environment Setup

Oracle VM VirtualBox installation

Ubuntu installation

Hadoop prerequisite installation

Java installation

SSH installation and configuration

Hadoop system user

Apache Hadoop installation

Hadoop configuration

Path configuration for Hadoop commands

Hadoop server start and stop

Summary

Chapter 3: Hadoop Ecosystem

Apache Hadoop

Hadoop Distributed File System

HDFS hands-on

Creating a directory in HDFS

Copying files from a local file system to HDFS

Copying files from HDFS to a local file system

Deleting files and folders in HDFS

Hadoop MapReduce

Job Tracker and Task Tracker

The execution flow of MapReduce 

Mapper

Shuffle and Sort

Reducer

Example program

Preparing the data file for analysis

Program code

Driver program

Mapper program

Reducer program

Observations and results

YARN

Resource Manager

Node Manager

Container

Application Master

Apache Projects related to big data

Apache Zookeeper

Apache Kafka

Apache Flume

Apache Cassandra

Apache HBase

Apache Spark

Summary

Chapter 4: NoSQL Database

What is NoSQL?

Benefits of NoSQL databases

NoSQL versus RDBMS

The CAP theorem

The ACID properties

Data models in NoSQL

Key-value data stores

Document store

Column stores

Graph stores

Apache Cassandra

Installation

Starting Cassandra

The Cassandra Query Language – CQL

The help command

Basic commands

Data manipulation

Creating, altering, and deleting a keyspace

Creating, altering, and deleting tables

Inserting, updating, and deleting data

The MongoDB database

Installing MongoDB

Starting MongoDB

Working on MongoDB

The help command

Basic commands

Data manipulation

Creating and deleting databases

Creating and deleting collections

The create, retrieve, update, delete operations

Neo4j database

Installing Neo4j

Starting Neo4j

The cypher query language

Help

Basic operations in Cypher

Creating nodes, relationships, and properties

Updating nodes, relationships, and properties

Deleting nodes, relationships, and properties

Reading nodes, relationships, and properties

Summary

Chapter 5: Off-the-Shelf Commercial Tools

Microsoft Azure

Building a practical application

Microsoft Azure account

The Azure Event Hub

IoT simulation application

Setting up an Azure Stream Analytics job

Input

Query 

Output

Dashboard in Power BI

Summary

Chapter 6: Containerization

Virtualization

Hypervisors

Hardware-based hypervisors

Software-based hypervisors

What is containerization?

Benefits of containers

Docker

Docker workflow

Installation

Basic commands

Docker images

Building a Docker image

Running and verifying Docker images

Importing and exporting Docker images

Docker Swarm

Setting up Docker Swarm

Creating service containers

Replicating containers

Removing container services

Kubernetes

Key components

Pods

ReplicaSets

Deployments

PetSets

Installation

Deployment

Kubernetes Dashboard

Summary

Chapter 7: Network Infrastructure

Network

Local area networks

Metropolitan area networks

Wide area networks

Network connectivity

Wired

Wireless

Network visualization

Gephi

Installation

Java installation

First run

Practical example

Summary

Chapter 8: Cloud Infrastructure

Companies moving to cloud 

Driving factors

Infrastructure

Locality of data

Requirements

Design considerations

Open source versus commercial

Commodity hardware versus purpose build

Cloud versus on-premises

Scale up and down

Application architecture

Cost decision

Summary

Chapter 9: Security and Monitoring

Simple Network Management Protocol

Benefits of SNMP

Security

Agents and Traps

Netflow

Nagios

Key benefits

Security Onion

Deployment scenarios

The Standalone model

The Server-Sensor model

Hybrid model

Preconfigured tools

Wireshark

Key features

Summary

Chapter 10: Frontend Architecture

React JS

Key concepts 

Node.js

JSX

Unidirectional dataflow

Getting started with ReactJS

Single page application

React application project

React app directory structure

Components

Properties

Event handling

State

Redux

Architecture of Redux

Key concepts

Single store

Action

Reducers

Guestbook application

Installation

Create a store

Setting up Reducer

Setting up Dispatcher

Connect function

Setting up Subscribers

Final output

Summary

Chapter 11: Backend Architecture

API

RESTful API

HTTP request methods

GET

POST

PUT

DELETE

Authentication

Basic authentication

JSON Web Token

Header

Payload

Signature

Practical

RESTful web service

Java client

Redis

Installation

Redis server

Redis client

Working with Redis

Redis data types and structures

String

HashMap

List

Set

Redis Publish/Subscribe

Common key operations

Summary

Chapter 12: Machine Learning

Machine learning

Types of algorithms

Parametric algorithms

Non-parametric algorithms

Supervised learning

The classification model

Binary classification 

Multi-class classification

The regression model

Linear regression

Polynomial regression

Unsupervised learning

Clustering, k-means

Neural networks

Feedforward neural network

Recurrent neural network

Symmetrically connected neural network

Deep neural networks

Decision tree classifiers

Summary

Chapter 13: Artificial Intelligence

Artificial intelligence

Convolutional neural networks

Deep learning using TensorFlow

TensorFlow

Installation

TensorFlow program

Uninstalling TensorFlow

TensorBoard

Program

Launching TensorBoard

TensorBoard graph

Object detection using YOLO

Installation

Compiling YOLO library

Trained weights

Detecting objects in an image

Summary

Chapter 14: Elasticsearch

Installing Elasticsearch

Starting the Elasticsearch server

Auto starting the Elasticsearch service

Stopping the Elasticsearch server

Uninstalling Elasticsearch

Kibana

Installation

Starting Kibana

Uninstalling Kibana

Security

Securing Elasticsearch

Securing Kibana

Understanding queries – CRUD commands

Creating

Reading

Updating

Deleting

Summary

Chapter 15: Structured Data

Data analysis

Installing MySQL

Importing data

Analyzing the data model

HBase

Installation

Starting an HBase instance

Stopping a HBase instance

Preparing an HBase for migration

Sqoop

Installation

Verifying the installation

MySQL JDBC driver

Importing data

Verifying the imported data

Summary

Chapter 16: Unstructured Data

Moving data into Hadoop

Downloading Flume

Environment configuration

Configuring agent and sink

Running Apache Flume

Transferring a log file

Converting images into text for analysis

Tesseract OCR

Installing Tesseract

Practical example

Complete code

Program execution

Summary

Chapter 17: Data Visualization

Matplotlib

Installing Matplotlib

Line chart

Bar charts

Stack charts

Scatter charts

Pie charts

Geographic projections

D3.js

Installation

Practical example

Output

Summary

Chapter 18: Financial Trading System

What is algorithmic trading?

Benefits of algorithmic trading

Big data in the financial market

Algorithmic trading strategies

Building an Expert Advisor

MetaTrader

Downloading and setting up MetaTrader

MetaQuotes language

Trading bot objective

Practical

Trading pattern – moving average

Decision time: buy or sell

Complete program

Backtesting in MetaTrader 4

Summary

Chapter 19: Retail Recommendation System

Types of recommendation system

Collaborative filtering

Content-based filtering

Demographic-based system

Utility-based system

Knowledge-based system

Hybrid model

Commercial tools

Barilliance

Softcube

Strands

Monetate

Nosto

Book recommendation system

Dataset

Directory structure

Code

Reading the dataset

Verifying the dataset

Data analysis

Age group

Commutative rating

Algorithms

Top-rated books

Popular books

Demographic-based recommendation

Useful resources

Summary

Other Books You May Enjoy

Index

The users who browse this book also browse