Mastering Java Machine Learning

Author: Dr. Uday Kamath;Krishna Choppella  

Publisher: Packt Publishing‎

Publication year: 2017

E-ISBN: 9781785888557

P-ISBN(Paperback): 9781785880513

Subject: TN919.5 数据处理系统及设备;TP Automation Technology , Computer Technology;TP274 数据处理、数据处理系统

Keyword: 数据处理系统及设备,数据处理、数据处理系统,自动化技术、计算机技术

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Mastering Java Machine Learning

Description

Become an advanced practitioner with this progressive set of master classes on application-oriented machine learning About This Book • Comprehensive coverage of key topics in machine learning with an emphasis on both the theoretical and practical aspects • More than 15 open source Java tools in a wide range of techniques, with code and practical usage. • More than 10 real-world case studies in machine learning highlighting techniques ranging from data ingestion up to analyzing the results of experiments, all preparing the user for the practical, real-world use of tools and data analysis. Who This Book Is For This book will appeal to anyone with a serious interest in topics in Data Science or those already working in related areas: ideally, intermediate-level data analysts and data scientists with experience in Java. Preferably, you will have experience with the fundamentals of machine learning and now have a desire to explore the area further, are up to grappling with the mathematical complexities of its algorithms, and you wish to learn the complete ins and outs of practical machine learning. What You Will Learn • Master key Java machine learning libraries, and what kind of problem each can solve, with theory and practical guidance. • Explore powerful techniques in each major category of machine learning such as classification, clustering, anomaly detection, graph modeling, and text mining. • Apply machine learning to real-world data with methodologies, processes, ap

Chapter

Preface

Chapter 1: Machine Learning Review

Machine learning – history and definition

What is not machine learning?

Machine learning – concepts and terminology

Machine learning – types and subtypes

Datasets used in machine learning

Machine learning applications

Practical issues in machine learning

Machine learning – roles and process

Roles

Process

Machine learning – tools and datasets

Datasets

Summary

Chapter 2: Practical Approach to Real-World Supervised Learning

Formal description and notation

Data quality analysis

Descriptive data analysis

Basic label analysis

Basic feature analysis

Visualization analysis

Univariate feature analysis

Multivariate feature analysis

Data transformation and preprocessing

Feature construction

Handling missing values

Outliers

Discretization

Data sampling

Is sampling needed?

Undersampling and oversampling

Training, validation, and test set

Feature relevance analysis and dimensionality reduction

Feature search techniques

Feature evaluation techniques

Filter approach

Wrapper approach

Embedded approach

Model building

Linear models

Linear Regression

Naïve Bayes

Logistic Regression

Non-linear models

Decision Trees

K-Nearest Neighbors (KNN)

Support vector machines (SVM)

Ensemble learning and meta learners

Bootstrap aggregating or bagging

Boosting

Model assessment, evaluation, and comparisons

Model assessment

Model evaluation metrics

Confusion matrix and related metrics

ROC and PRC curves

Gain charts and lift curves

Model comparisons

Comparing two algorithms

Comparing multiple algorithms

Case Study – Horse Colic Classification

Business problem

Machine learning mapping

Data analysis

Label analysis

Features analysis

Supervised learning experiments

Weka experiments

RapidMiner experiments

Results, observations, and analysis

Summary

References

Chapter 3: Unsupervised Machine Learning Techniques

Issues in common with supervised learning

Issues specific to unsupervised learning

Feature analysis and dimensionality reduction

Notation

Linear methods

Principal component analysis (PCA)

Random projections (RP)

Multidimensional Scaling (MDS)

Nonlinear methods

Kernel Principal Component Analysis (KPCA)

Manifold learning

Clustering

Clustering algorithms

k-Means

DBSCAN

Mean shift

Expectation maximization (EM) or Gaussian mixture modeling (GMM)

Hierarchical clustering

Self-organizing maps (SOM)

Spectral clustering

Affinity propagation

Clustering validation and evaluation

Internal evaluation measures

External evaluation measures

Outlier or anomaly detection

Outlier algorithms

Statistical-based

Distance-based methods

Density-based methods

Clustering-based methods

High-dimensional-based methods

One-class SVM

Outlier evaluation techniques

Supervised evaluation

Unsupervised evaluation

Real-world case study

Tools and software

Business problem

Machine learning mapping

Data collection

Data quality analysis

Data sampling and transformation

Feature analysis and dimensionality reduction

Observations on feature analysis and dimensionality reduction

Clustering models, results, and evaluation

Observations and clustering analysis

Outlier models, results, and evaluation

Summary

References

Chapter 4: Semi-Supervised and Active Learning

Semi-supervised learning

Representation, notation, and assumptions

Semi-supervised learning techniques

Self-training SSL

Co-training SSL or multi-view SSL

Cluster and label SSL

Transductive graph label propagation

Transductive SVM (TSVM)

Case study in semi-supervised learning

Tools and software

Business problem

Machine learning mapping

Data collection

Data quality analysis

Data sampling and transformation

Datasets and analysis

Experiments and results

Active learning

Representation and notation

Active learning scenarios

Active learning approaches

Uncertainty sampling

Version space sampling

Query by disagreement (QBD)

Advantages and limitations

Data distribution sampling

How does it work?

Advantages and limitations

Case study in active learning

Tools and software

Business problem

Machine learning mapping

Data Collection

Data sampling and transformation

Feature analysis and dimensionality reduction

Models, results, and evaluation

Pool-based scenarios

Stream-based scenarios

Analysis of active learning results

Summary

References

Chapter 5: Real-Time Stream Machine Learning

Assumptions and mathematical notations

Basic stream processing and computational techniques

Stream computations

Sliding windows

Sampling

Concept drift and drift detection

Data management

Partial memory

Full memory

Detection methods

Adaptation methods

Incremental supervised learning

Modeling techniques

Linear algorithms

Non-linear algorithms

Ensemble algorithms

Validation, evaluation, and comparisons in online setting

Model validation techniques

Incremental unsupervised learning using clustering

Modeling techniques

Partition based

Hierarchical based and micro clustering

Density based

Grid based

Validation and evaluation techniques

Unsupervised learning using outlier detection

Partition-based clustering for outlier detection

Inputs and outputs

How does it work?

Advantages and limitations

Distance-based clustering for outlier detection

Inputs and outputs

How does it work?

Validation and evaluation techniques

Case study in stream learning

Tools and software

Business problem

Machine learning mapping

Data collection

Data sampling and transformation

Feature analysis and dimensionality reduction

Models, results, and evaluation

Supervised learning experiments

Clustering experiments

Outlier detection experiments

Analysis of stream learning results

Summary

References

Chapter 6: Probabilistic Graph Modeling

Probability revisited

Concepts in probability

Conditional probability

Chain rule and Bayes' theorem

Random variables, joint, and marginal distributions

Marginal independence and conditional independence

Factors

Distribution queries

Graph concepts

Graph structure and properties

Subgraphs and cliques

Path, trail, and cycles

Bayesian networks

Representation

Definition

Reasoning patterns

Independencies, flow of influence, D-Separation, I-Map

Inference

Elimination-based inference

Propagation-based techniques

Sampling-based techniques

Learning

Learning parameters

Learning structures

Markov networks and conditional random fields

Representation

Parameterization

Independencies

Inference

Learning

Conditional random fields

Specialized networks

Tree augmented network

Input and output

How does it work?

Advantages and limitations

Markov chains

Hidden Markov models

Most probable path in HMM

Posterior decoding in HMM

Tools and usage

OpenMarkov

Weka Bayesian Network GUI

Case study

Business problem

Machine learning mapping

Data sampling and transformation

Feature analysis

Models, results, and evaluation

Analysis of results

Summary

References

Chapter 7: Deep Learning

Multi-layer feed-forward neural network

Inputs, neurons, activation function, and mathematical notation

Multi-layered neural network

Structure and mathematical notations

Activation functions in NN

Training neural network

Limitations of neural networks

Vanishing gradients, local optimum, and slow training

Deep learning

Building blocks for deep learning

Rectified linear activation function

Restricted Boltzmann Machines

Autoencoders

Unsupervised pre-training and supervised fine-tuning

Deep feed-forward NN

Deep Autoencoders

Deep Belief Networks

Deep learning with dropouts

Sparse coding

Convolutional Neural Network

CNN Layers

Recurrent Neural Networks

Case study

Tools and software

Business problem

Machine learning mapping

Data sampling and transformation

Feature analysis

Models, results, and evaluation

Basic data handling

Multi-layer perceptron

Convolutional Network

Variational Autoencoder

DBN

Parameter search using Arbiter

Results and analysis

Summary

References

Chapter 8: Text Mining and Natural Language Processing

NLP, subfields, and tasks

Text categorization

Part-of-speech tagging (POS tagging)

Text clustering

Information extraction and named entity recognition

Sentiment analysis and opinion mining

Coreference resolution

Word sense disambiguation

Machine translation

Semantic reasoning and inferencing

Text summarization

Automating question and answers

Issues with mining unstructured data

Text processing components and transformations

Document collection and standardization

Inputs and outputs

How does it work?

Tokenization

Inputs and outputs

How does it work?

Stop words removal

Inputs and outputs

How does it work?

Stemming or lemmatization

Inputs and outputs

How does it work?

Local/global dictionary or vocabulary

Feature extraction/generation

Lexical features

Syntactic features

Semantic features

Feature representation and similarity

Vector space model

Similarity measures

Feature selection and dimensionality reduction

Feature selection

Dimensionality reduction

Topics in text mining

Text categorization/classification

Topic modeling

Probabilistic latent semantic analysis (PLSA)

Text clustering

Feature transformation, selection, and reduction

Clustering techniques

Evaluation of text clustering

Named entity recognition

Hidden Markov models for NER

Maximum entropy Markov models for NER

Deep learning and NLP

Tools and usage

Mallet

KNIME

Topic modeling with mallet

Business problem

Machine Learning mapping

Data collection

Data sampling and transformation

Feature analysis and dimensionality reduction

Models, results, and evaluation

Analysis of text processing results

Summary

References

Chapter 9: Big Data Machine Learning – The Final Frontier

What are the characteristics of Big Data?

Big Data Machine Learning

General Big Data framework

Big Data cluster deployment frameworks

Data acquisition

Data storage

Data processing and preparation

Machine Learning

Visualization and analysis

Batch Big Data Machine Learning

H2O as Big Data Machine Learning platform

H2O architecture

Machine learning in H2O

Tools and usage

Case study

Business problem

Machine Learning mapping

Data collection

Data sampling and transformation

Experiments, results, and analysis

Spark MLlib as Big Data Machine Learning platform

Spark architecture

Machine Learning in MLlib

Tools and usage

Experiments, results, and analysis

Real-time Big Data Machine Learning

SAMOA as a real-time Big Data Machine Learning framework

Machine Learning algorithms

Tools and usage

Experiments, results, and analysis

The future of Machine Learning

Summary

References

Appendix A: Linear Algebra

Vector

Scalar product of vectors

Matrix

Transpose of a matrix

Matrix addition

Scalar multiplication

Matrix multiplication

Singular value decomposition (SVD)

Appendix B: Probability

Axioms of probability

Bayes' theorem

Density estimation

Mean

Variance

Standard deviation

Gaussian standard deviation

Covariance

Correlation coefficient

Binomial distribution

Poisson distribution

Gaussian distribution

Central limit theorem

Error propagation

Index

The users who browse this book also browse


No browse record.