Chapter
Chapter 1: Machine Learning Review
Machine learning – history and definition
What is not machine learning?
Machine learning – concepts and terminology
Machine learning – types and subtypes
Datasets used in machine learning
Machine learning applications
Practical issues in machine learning
Machine learning – roles and process
Machine learning – tools and datasets
Chapter 2: Practical Approach
to Real-World
Supervised Learning
Formal description and notation
Descriptive data analysis
Univariate feature analysis
Multivariate feature analysis
Data transformation and preprocessing
Undersampling and oversampling
Training, validation, and test set
Feature relevance analysis and dimensionality reduction
Feature search techniques
Feature evaluation techniques
K-Nearest Neighbors (KNN)
Support vector machines (SVM)
Ensemble learning and meta learners
Bootstrap aggregating or bagging
Model assessment, evaluation, and comparisons
Confusion matrix and related metrics
Gain charts and lift curves
Comparing multiple algorithms
Case Study – Horse Colic Classification
Supervised learning experiments
Results, observations, and analysis
Chapter 3: Unsupervised Machine Learning Techniques
Issues in common with supervised learning
Issues specific to unsupervised learning
Feature analysis and dimensionality reduction
Principal component analysis (PCA)
Multidimensional Scaling (MDS)
Kernel Principal Component Analysis (KPCA)
Expectation maximization (EM) or Gaussian mixture modeling (GMM)
Self-organizing maps (SOM)
Clustering validation and evaluation
Internal evaluation measures
External evaluation measures
Outlier or anomaly detection
High-dimensional-based methods
Outlier evaluation techniques
Data sampling and transformation
Feature analysis and dimensionality reduction
Observations on feature analysis and dimensionality reduction
Clustering models, results, and evaluation
Observations and clustering analysis
Outlier models, results, and evaluation
Chapter 4: Semi-Supervised and
Active Learning
Representation, notation, and assumptions
Semi-supervised learning techniques
Co-training SSL or multi-view SSL
Transductive graph label propagation
Case study in semi-supervised learning
Data sampling and transformation
Representation and notation
Active learning scenarios
Active learning approaches
Query by disagreement (QBD)
Advantages and limitations
Data distribution sampling
Advantages and limitations
Case study in active learning
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Analysis of active learning results
Chapter 5: Real-Time Stream
Machine Learning
Assumptions and mathematical notations
Basic stream processing and computational techniques
Concept drift and drift detection
Incremental supervised learning
Validation, evaluation, and comparisons in online setting
Model validation techniques
Incremental unsupervised learning using clustering
Hierarchical based and micro clustering
Validation and evaluation techniques
Unsupervised learning using outlier detection
Partition-based clustering for outlier detection
Advantages and limitations
Distance-based clustering for outlier detection
Validation and evaluation techniques
Case study in stream learning
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Supervised learning experiments
Outlier detection experiments
Analysis of stream learning results
Chapter 6: Probabilistic Graph Modeling
Chain rule and Bayes' theorem
Random variables, joint, and marginal distributions
Marginal independence and conditional independence
Graph structure and properties
Independencies, flow of influence, D-Separation, I-Map
Elimination-based inference
Propagation-based techniques
Sampling-based techniques
Markov networks and conditional random fields
Conditional random fields
Advantages and limitations
Most probable path in HMM
Posterior decoding in HMM
Weka Bayesian Network GUI
Data sampling and transformation
Models, results, and evaluation
Multi-layer feed-forward neural network
Inputs, neurons, activation function, and mathematical notation
Multi-layered neural network
Structure and mathematical notations
Activation functions in NN
Limitations of neural networks
Vanishing gradients, local optimum, and slow training
Building blocks for deep learning
Rectified linear activation function
Restricted Boltzmann Machines
Unsupervised pre-training and supervised
fine-tuning
Deep learning with dropouts
Convolutional Neural Network
Recurrent Neural Networks
Data sampling and transformation
Models, results, and evaluation
Parameter search using Arbiter
Chapter 8: Text Mining and Natural Language Processing
NLP, subfields, and tasks
Part-of-speech tagging (POS tagging)
Information extraction and named entity recognition
Sentiment analysis and opinion mining
Word sense disambiguation
Semantic reasoning and inferencing
Automating question and answers
Issues with mining unstructured data
Text processing components and transformations
Document collection and standardization
Stemming or lemmatization
Local/global dictionary or vocabulary
Feature extraction/generation
Feature representation and similarity
Feature selection and dimensionality reduction
Text categorization/classification
Probabilistic latent semantic analysis (PLSA)
Feature transformation, selection, and reduction
Evaluation of text clustering
Hidden Markov models for NER
Maximum entropy Markov models for NER
Topic modeling with mallet
Data sampling and transformation
Feature analysis and dimensionality reduction
Models, results, and evaluation
Analysis of text processing results
Chapter 9: Big Data Machine Learning – The Final Frontier
What are the characteristics of Big Data?
Big Data Machine Learning
General Big Data framework
Big Data cluster deployment frameworks
Data processing and preparation
Visualization and analysis
Batch Big Data Machine Learning
H2O as Big Data Machine Learning platform
Data sampling and transformation
Experiments, results, and analysis
Spark MLlib as Big Data Machine Learning platform
Machine Learning in MLlib
Experiments, results, and analysis
Real-time Big Data Machine Learning
SAMOA as a real-time Big Data Machine Learning framework
Machine Learning algorithms
Experiments, results, and analysis
The future of Machine Learning
Appendix A: Linear Algebra
Scalar product of vectors
Singular value decomposition (SVD)
Gaussian standard deviation