Chapter
Introduction and First Steps – Take a Deep Breath
Satisfaction and enjoyment
Who is using Python today?
Setting up the environment
Python 2 versus Python 3 – the great debate
What you need for this course
Installing additional packages
How you can run a Python program
Running the Python interactive shell
Running Python as a service
Running Python as a GUI application
How is Python code organized
How do we use modules and packages
Guidelines on how to write good code
Introducing object-oriented
Specifying attributes and behaviors
Hiding details and creating the
public interface
Inheritance provides abstraction
Organizing module contents
Different sets of arguments
Using an abstract base class
Creating an abstract base class
The effects of an exception
Defining our own exceptions
When to Use Object-oriented Programming
Adding behavior to class data with properties
Decorators – another way to create properties
Deciding when to use properties
Python Object-oriented Shortcuts
Python built-in functions
An alternative to method overloading
Functions are objects too
Using functions as attributes
Strings and Serialization
Matching a selection of characters
Matching multiple characters
Grouping patterns together
Getting information from regular expressions
Making repeated regular expressions efficient
Set and dictionary comprehensions
Yield items from another iterable
Closing coroutines and throwing exceptions
The relationship between coroutines, generators, and functions
State transition as coroutines
Python Design Patterns II
The abstract factory pattern
Testing Object-oriented Programs
Reducing boilerplate and cleaning up
Organizing and running tests
One way to do setup and cleanup
A completely different way to set up variables
Skipping tests with py.test
Imitating expensive objects
How much testing is enough?
The many problems with threads
The global interpreter lock
The problems with multiprocessing
Reading an AsyncIO future
Using executors to wrap blocking code
Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
NumPy Arrays and Vectorized Computation
Numerical operations on arrays
Data processing using arrays
Linear algebra with NumPy
Data Analysis with pandas
An overview of the pandas package
The pandas data structure
The essential basic functionality
Reindexing and altering labels
Indexing and selecting data
Working with missing data
Advanced uses of pandas for data analysis
The matplotlib API primer
Plotting functions with pandas
Additional Python data visualization tools
Working with date and time objects
Downsampling time series data
Upsampling time series data
Interacting with Databases
Interacting with data in text format
Reading data from text format
Writing data to text format
Interacting with data in binary format
Interacting with data in MongoDB
Interacting with data in Redis
Data Analysis Application Examples
Getting Started with
Data Mining
A simple affinity analysis example
What is affinity analysis?
Loading the dataset with NumPy
Implementing a simple ranking of rules
Ranking to find the best rules
A simple classification example
Loading and preparing the dataset
Implementing the OneR algorithm
Classifying with scikit-learn Estimators
Moving towards a standard workflow
Preprocessing using pipelines
Predicting Sports Winners with Decision Trees
Using pandas to load the dataset
Parameters in decision trees
Sports outcome prediction
Parameters in Random forests
Recommending Movies Using Affinity Analysis
Algorithms for affinity analysis
The movie recommendation problem
The Apriori implementation
Extracting association rules
Extracting Features with Transformers
Representing reality in models
Selecting the best individual features
Creating your own transformer
Social Media Insight
Using Naive Bayes
Downloading data from a social network
Loading and classifying the dataset
Creating a replicable dataset from Twitter
Converting dictionaries to a matrix
Training the Naive Bayes classifier
Evaluation using the F1-score
Getting useful features from models
Discovering Accounts to Follow Using Graph Mining
Classifying with an existing model
Getting follower information from Twitter
Creating a similarity graph
Beating CAPTCHAs with Neural Networks
Artificial neural networks
An introduction to neural networks
Splitting the image into individual letters
Creating a training dataset
Adjusting our training dataset to our methodology
Improving accuracy using a dictionary
Ranking mechanisms for words
Attributing documents to authors
Applications and use cases
Classifying with function words
Extracting character n-grams
Accessing the Enron dataset
Creating a dataset loader
Using a Web API to get data
Extracting text from arbitrary websites
Finding the stories in arbitrary websites
Extracting topic information from clusters
Using clustering algorithms as transformers
An introduction to online learning
Classifying Objects in Images Using Deep Learning
Application scenario and goals
An introduction to Theano
An introduction to Lasagne
Implementing neural networks with nolearn
When to use GPUs for computation
Running our code on a GPU
Setting up the environment
Creating the neural network
Application scenario and goals
Extracting the blog posts
Training on Amazon's EMR infrastructure
Chapter 1 – Getting Started with
Data Mining
Extending the IPython Notebook
Chapter 2 – Classifying with scikit-learn Estimators
Chapter 3: Predicting Sports Winners with Decision Trees
Chapter 4 – Recommending Movies Using Affinity Analysis
Chapter 5 – Extracting Features
with Transformers
Chapter 6 – Social Media Insight Using Naive Bayes
Natural language processing and part-of-speech tagging
Chapter 7 – Discovering Accounts to Follow Using Graph Mining
Chapter 8 – Beating CAPTCHAs with Neural Networks
Chapter 9 – Authorship Attribution
Chapter 10 – Clustering News Articles
Chapter 11 – Classifying Objects in Images Using Deep Learning
Chapter 12 – Working with Big Data
Giving Computers the Ability to Learn from Data
How to transform data into knowledge
The three different types of
machine learning
Making predictions about the future with supervised learning
Classification for predicting class labels
Regression for predicting continuous outcomes
Solving interactive problems with reinforcement learning
Discovering hidden structures with unsupervised learning
Finding subgroups with clustering
Dimensionality reduction for data compression
An introduction to the basic terminology and notations
A roadmap for building machine learning systems
Preprocessing – getting data into shape
Training and selecting a predictive model
Evaluating models and predicting unseen data instances
Using Python for machine learning
Training Machine Learning Algorithms for Classification
Artificial neurons – a brief glimpse into the early history of machine learning
Implementing a perceptron learning algorithm in Python
Training a perceptron model on the Iris dataset
Adaptive linear neurons and the convergence of learning
Minimizing cost functions with gradient descent
Implementing an Adaptive Linear Neuron in Python
Large scale machine learning and stochastic gradient descent
A Tour of Machine Learning Classifiers Using scikit-learn
Choosing a classification algorithm
First steps with scikit-learn
Training a perceptron via scikit-learn
Modeling class probabilities via logistic regression
Logistic regression intuition and conditional probabilities
Learning the weights of the logistic cost function
Training a logistic regression model with scikit-learn
Tackling overfitting via regularization
Maximum margin classification with support vector machines
Dealing with the nonlinearly separable case using slack variables
Alternative implementations in scikit-learn
Solving nonlinear problems using a kernel SVM
Using the kernel trick to find separating hyperplanes in higher dimensional space
Maximizing information gain – getting the most bang for the buck
Combining weak to strong learners via random forests
K-nearest neighbors – a lazy learning algorithm
Building Good Training
Sets – Data Preprocessing
Dealing with missing data
Eliminating samples or features with missing values
Understanding the scikit-learn estimator API
Handling categorical data
Performing one-hot encoding on nominal features
Partitioning a dataset in training and test sets
Bringing features onto the same scale
Selecting meaningful features
Sparse solutions with L1 regularization
Sequential feature selection algorithms
Assessing feature importance with random forests
Compressing Data via Dimensionality Reduction
Unsupervised dimensionality reduction via principal component analysis
Total and explained variance
Principal component analysis in scikit-learn
Supervised data compression via linear discriminant analysis
Computing the scatter matrices
Selecting linear discriminants for the new feature subspace
Projecting samples onto the new feature space
Using kernel principal component analysis for nonlinear mappings
Kernel functions and the kernel trick
Implementing a kernel principal component analysis in Python
Example 1 – separating half-moon shapes
Example 2 – separating concentric circles
Projecting new data points
Kernel principal component analysis in
scikit-learn
Learning Best Practices for Model Evaluation and Hyperparameter Tuning
Streamlining workflows with pipelines
Loading the Breast Cancer Wisconsin dataset
Combining transformers and estimators in a pipeline
Using k-fold cross-validation to assess model performance
Debugging algorithms with learning and validation curves
Diagnosing bias and variance problems with learning curves
Addressing overfitting and underfitting with validation curves
Fine-tuning machine learning models via grid search
Tuning hyperparameters via grid search
Algorithm selection with nested
cross-validation
Looking at different performance evaluation metrics
Reading a confusion matrix
Optimizing the precision and recall of a classification model
Plotting a receiver operating characteristic
The scoring metrics for multiclass classification
Combining Different Models for Ensemble Learning
Implementing a simple majority vote classifier
Combining different algorithms for classification with majority vote
Evaluating and tuning the ensemble classifier
Bagging – building an ensemble of classifiers from bootstrap samples
Leveraging weak learners via adaptive boosting
Predicting Continuous
Target Variables with Regression Analysis
Introducing a simple linear regression model
Exploring the Housing Dataset
Visualizing the important characteristics of a dataset
Implementing an ordinary least squares linear regression model
Solving regression for regression parameters with gradient descent
Estimating the coefficient of a regression model via scikit-learn
Fitting a robust regression model using RANSAC
Evaluating the performance of linear regression models
Using regularized methods for regression
Turning a linear regression model into a curve – polynomial regression
Modeling nonlinear relationships in the Housing Dataset
Dealing with nonlinear relationships using random forests
Reflect and Test Yourself!
Answers
Chapter 1: Introducing Data Analysis and Libraries
Chapter 2: Object-oriented Design
Chapter 3: Data Analysis with pandas
Chapter 4: Data Visualization
Chapter 6: Interacting with Databases
Chapter 7: Data Analysis Application Examples
Chapter 1: Getting Started with Data Mining
Chapter 2: Classifying with scikit-learn Estimators
Chapter 3: Predicting Sports Winners with Decision Trees
Chapter 4: Recommending Movies Using Affinity Analysis
Chapter 5: Extracting Features with Transformers
Chapter 6: Social Media Insight Using Naive Bayes
Chapter 7: Discovering Accounts to Follow Using Graph Mining
Chapter 8: Beating CAPTCHAs with Neural Networks
Chapter 9: Authorship Attribution
Chapter 10: Clustering News Articles
Chapter 11: Classifying Objects in Images Using Deep Learning
Chapter 12: Working with Big Data
Module 4: Machine Learning
Chapter 1: Giving Computers the Ability to Learn from Data
Chapter 2: Training Machine Learning
Chapter 3: A Tour of Machine Learning Classifiers Using scikit-learn
Chapter 4: Building Good Training Sets – Data Preprocessing
Chapter 5: Compressing Data via Dimensionality Reduction
Chapter 6: Learning Best Practices for Model Evaluation and Hyperparameter Tuning
Chapter 7: Combining Different Models for Ensemble Learning
Chapter 8: Predicting Continuous Target Variables with Regression Analysis