Chapter
Chapter 1: Too Big or Not Too Big
Dawn of the information age
Dr. Alan Turing and modern computing
The advent of the stored-program computer
From magnetic devices to SSDs
Why we are talking about big data now if data has always existed
Building blocks of big data analytics
When do you know you have a big data problem and where do you start your search for the big data solution?
Chapter 2: Big Data Mining for the Masses
Big data mining in the enterprise
Building the case for a Big Data strategy
Implementation life cycle
Stakeholders of the solution
Implementing the solution
Technical elements of the big data platform
Selection of the hardware stack
Selection of the software stack
Chapter 3: The Analytics Toolkit
Components of the Analytics Toolkit
Installing on a laptop or workstation
Installing Oracle VirtualBox
Installing CDH in other environments
Installing Packt Data Science Box
Steps for downloading and installing Microsoft R Open
Chapter 4: Big Data With Hadoop
The fundamentals of Hadoop
The fundamental premise of Hadoop
The core modules of Hadoop
Hadoop Distributed File System - HDFS
Data storage process in HDFS
An intuitive introduction to MapReduce
A technical understanding of MapReduce
Block size and number of mappers and reducers
Hadoop data storage formats
New features expected in Hadoop 3
WordCount using Hadoop MapReduce
Analyzing oil import prices with Hive
Chapter 5: Big Data Mining with NoSQL
The ACID, BASE, and CAP properties
The BASE property of NoSQL
The need for NoSQL technologies
Document-oriented databases
Other NoSQL types and summary of other types of databases
Analyzing Nobel Laureates data with MongoDB
Installing and using MongoDB
Tracking physician payments with real-world data
Installing kdb+, R, and RStudio
The CMS Open Payments Portal
Downloading the CMS Open Payments data
Creating the Q application
Creating the frontend web portal
R Shiny platform for developers
Putting it all together - The CMS Open Payments application
Chapter 6: Spark for Big Data Analytics
Overcoming the limitations of Hadoop
Theoretical concepts in Spark
Resilient distributed datasets
Actions and transformations
The architecture of Spark
Signing up for Databricks Community Edition
Spark exercise - hands-on with Spark (Databricks)
Chapter 7: An Introduction to Machine Learning Concepts
What is machine learning?
The evolution of machine learning
Factors that led to the success of machine learning
Machine learning, statistics, and AI
Categories of machine learning
Supervised and unsupervised machine learning
Supervised machine learning
Vehicle Mileage, Number Recognition and other examples
Unsupervised machine learning
Subdividing supervised machine learning
Common terminologies in machine learning
The core concepts in machine learning
Data management steps in machine learning
Pre-processing and feature selection techniques
The near-zero variance function
Removing correlated variables
Other common data transformations
The importance of variables
The train, test splits, and cross-validation concepts
Splitting the data into train and test sets
The cross-validation parameter
Leveraging multicore processing in the model
Chapter 8: Machine Learning Deep Dive
The bias, variance, and regularization properties
The gradient descent and VC Dimension theories
Popular machine learning algorithms
The Random forest extension
The K-Means machine learning technique
The neural networks related algorithms
Tutorial - associative rules mining with CMS data
Writing the R code for Apriori
Using custom CSS and fonts for the application
Chapter 9: Enterprise Data Science
Enterprise data science overview
A roadmap to enterprise analytics success
Data science solutions in the enterprise
Enterprise data warehouse and data mining
Traditional data warehouse systems
Oracle Exadata, Exalytics, and TimesTen
IBM data warehouse systems (formerly Netezza appliances)
Enterprise and open source NoSQL Databases
Amazon Redshift, Redshift Spectrum, and Athena databases
Google BigQuery and other cloud services
Enterprise data science – machine learning and AI
The R programming language
OpenCV, Caffe, and others
Machine learning as a service
Enterprise infrastructure solutions
Containers – Docker, Kubernetes, and Mesos
Tutorial – using RStudio in the cloud
Chapter 10: Closing Thoughts on Big Data
Corporate big data and data science strategy
Silicon Valley and data science
Characteristics of successful projects
Appendix: External Data Science Resources
Courses on machine learning
Machine learning and deep learning links
Web-based machine learning services
Machine learning books from Packt
Books for leisure reading
Other Books You May Enjoy
Leave a review - let other readers know what you think