Chapter
Applications and examples of predictive modelling
Python and its packages – download and installation
Python and its packages for predictive modelling
Reading the data – variations and examples
Various methods of importing data in Python
Basics – summary, dimensions, and structure
Visualizing a dataset by basic plotting
Chapter 3: Data Wrangling
Generating random numbers and their usage
Grouping the data – aggregation, filtering, and transformation
Random sampling – splitting a dataset in training and testing datasets
Concatenating and appending data
Chapter 4: Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem
Chapter 5: Linear Regression
with Python
Understanding the maths behind linear regression
Making sense of result parameters
Implementing linear regression with Python
Handling other issues in linear regression
Chapter 6: Logistic Regression
with Python
Linear regression versus logistic regression
Understanding the math behind logistic regression
Implementing logistic regression with Python
Model validation and evaluation
Chapter 7: Clustering with Python
Introduction to clustering – what, why,
and how?
Mathematics behind clustering
Implementing clustering using Python
Fine-tuning the clustering
Chapter 8: Trees and Random Forests with Python
Introducing decision trees
Understanding the mathematics behind decision trees
Implementing a decision tree with
scikit-learn
Understanding and implementing regression trees
Understanding and implementing random forests
Chapter 9: Best Practices for
Predictive Modelling
Best practices for coding
Best practices for data handling
Best practices for algorithms
Best practices for statistics
Best practices for business contexts
Appendix: A List of Links
Module 2: Mastering Predictive Analytics with Python
Chapter 1: From Data to Decisions – Getting Started with Analytic Applications
Designing an advanced analytic solution
Case study: sentiment analysis of social media feeds
Case study: targeted e-mail campaigns
Chapter 2: Exploratory Data Analysis and Visualization in Python
Exploring categorical and numerical data in IPython
Working with geospatial data
Chapter 3: Finding Patterns in the Noise – Clustering and Unsupervised Learning
Similarity and distance metrics
Affinity propagation – automatically choosing cluster numbers
Streaming clustering in Spark
Chapter 4: Connecting the Dots with Models – Regression Methods
Scaling out with PySpark – predicting year of song release
Chapter 5: Putting Data in its
Place – Classification Methods and Analysis
Evaluating classification models
Separating Nonlinear boundaries with Support vector machines
Comparing classification methods
Case study: fitting classifier models
in pyspark
Chapter 6: Words and Pixels – Working with Unstructured Data
Working with textual data
Principal component analysis
Case Study: Training a Recommender System in PySpark
Chapter 7: Learning from the Bottom Up – Deep Networks and Unsupervised Features
Learning patterns with neural networks
The TensorFlow library and digit recognition
Chapter 8: Sharing Models with Prediction Services
The architecture of a prediction service
Clients and making requests
Server – the web traffic controller
Persisting information with database systems
Case study – logistic regression service
Chapter 9: Reporting and Testing – Iterating on Analytic Systems
Checking the health of models with diagnostics
Iterating on models through A/B testing
Guidelines for communication