Deep Learning for Computer Vision

Author： rajalingappaa shanmugamani Abdul Ghani Abdul Rahman Stephen Maurice Moore Nishanth Koganti

Publisher： Packt Publishing‎

Publication year： 2018

E-ISBN: 9781788293358

P-ISBN(Paperback): 89543100771530

Subject： TP18 artificial intelligence theory

Language： ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Deep Learning for Computer Vision

Chapter

Chapter 1: Getting Started

Understanding deep learning

Perceptron

Activation functions

Sigmoid

The hyperbolic tangent function

The Rectified Linear Unit (ReLU)

Artificial neural network (ANN)

One-hot encoding

Softmax

Cross-entropy

Dropout

Batch normalization

L1 and L2 regularization

Training neural networks

Backpropagation

Gradient descent

Stochastic gradient descent

Playing with TensorFlow playground

Convolutional neural network

Kernel

Max pooling

Recurrent neural networks (RNN)

Long short-term memory (LSTM)

Deep learning for computer vision

Classification

Detection or localization and segmentation

Similarity learning

Image captioning

Generative models

Video analysis

Development environment setup

Hardware and Operating Systems - OS

General Purpose - Graphics Processing Unit (GP-GPU)

Computer Unified Device Architecture - CUDA

CUDA Deep Neural Network - CUDNN

Installing software packages

Python

Open Computer Vision - OpenCV

The TensorFlow library

Installing TensorFlow

TensorFlow example to print Hello, TensorFlow

TensorFlow example for adding two numbers

TensorBoard

The TensorFlow Serving tool

The Keras library

Summary

Chapter 2: Image Classification

Training the MNIST model in TensorFlow

The MNIST datasets

Loading the MNIST data

Building a perceptron

Defining placeholders for input data and targets

Defining the variables for a fully connected layer

Training the model with data

Building a multilayer convolutional network

Utilizing TensorBoard in deep learning

Training the MNIST model in Keras

Preparing the dataset

Building the model

Other popular image testing datasets

The CIFAR dataset

The Fashion-MNIST dataset

The ImageNet dataset and competition

The bigger deep learning models

The AlexNet model

The VGG-16 model

The Google Inception-V3 model

The Microsoft ResNet-50 model

The SqueezeNet model

Spatial transformer networks

The DenseNet model

Training a model for cats versus dogs

Preparing the data

Benchmarking with simple CNN

Augmenting the dataset

Augmentation techniques

Transfer learning or fine-tuning of a model

Training on bottleneck features

Fine-tuning several layers in deep learning

Developing real-world applications

Choosing the right model

Tackling the underfitting and overfitting scenarios

Gender and age detection from face

Fine-tuning apparel models

Brand safety

Summary

Chapter 3: Image Retrieval

Understanding visual features

Visualizing activation of deep learning models

Embedding visualization

Guided backpropagation

The DeepDream

Adversarial examples

Model inference

Exporting a model

Serving the trained model

Content-based image retrieval

Building the retrieval pipeline

Extracting bottleneck features for an image

Computing similarity between query image and target database

Efficient retrieval

Matching faster using approximate nearest neighbour

Advantages of ANNOY

Autoencoders of raw images

Denoising using autoencoders

Summary

Chapter 4: Object Detection

Detecting objects in an image

Exploring the datasets

ImageNet dataset

PASCAL VOC challenge

COCO object detection challenge

Evaluating datasets using metrics

Intersection over Union

The mean average precision

Localizing algorithms

Localizing objects using sliding windows

The scale-space concept

Training a fully connected layer as a convolution layer

Convolution implementation of sliding window

Thinking about localization as a regression problem

Applying regression to other problems

Combining regression with the sliding window

Detecting objects

Regions of the convolutional neural network (R-CNN)

Fast R-CNN

Faster R-CNN

Single shot multi-box detector

Object detection API

Installation and setup

Pre-trained models

Re-training object detection models

Data preparation for the Pet dataset

Object detection training pipeline

Training the model

Monitoring loss and accuracy using TensorBoard

Training a pedestrian detection for a self-driving car

The YOLO object detection algorithm

Summary

Chapter 5: Semantic Segmentation

Predicting pixels

Diagnosing medical images

Understanding the earth from satellite imagery

Enabling robots to see

Datasets

Algorithms for semantic segmentation

The Fully Convolutional Network

The SegNet architecture

Upsampling the layers by pooling

Sampling the layers by convolution

Skipping connections for better training

Dilated convolutions

DeepLab

RefiNet

PSPnet

Large kernel matters

DeepLab v3

Ultra-nerve segmentation

Segmenting satellite images

Modeling FCN for segmentation

Segmenting instances

Summary

Chapter 6: Similarity Learning

Algorithms for similarity learning

Siamese networks

Contrastive loss

FaceNet

Triplet loss

The DeepNet model

DeepRank

Visual recommendation systems

Human face analysis

Face detection

Face landmarks and attributes

The Multi-Task Facial Landmark (MTFL) dataset

The Kaggle keypoint dataset

The Multi-Attribute Facial Landmark (MAFL) dataset

Learning the facial key points

Face recognition

The labeled faces in the wild (LFW) dataset

The YouTube faces dataset

The CelebFaces Attributes dataset (CelebA)

CASIA web face database

The VGGFace2 dataset

Computing the similarity between faces

Finding the optimum threshold

Face clustering

Summary

Chapter 7: Image Captioning

Understanding the problem and datasets

Understanding natural language processing for image captioning

Expressing words in vector form

Converting words to vectors

Training an embedding

Approaches for image captioning and related problems

Using a condition random field for linking image and text

Using RNN on CNN features to generate captions

Creating captions using image ranking

Retrieving captions from images and images from captions

Dense captioning

Using RNN for captioning

Using multimodal metric space

Using attention network for captioning

Knowing when to look

Implementing attention-based image captioning

Summary

Chapter 8: Generative Models

Applications of generative models

Artistic style transfer

Predicting the next frame in a video

Super-resolution of images

Interactive image generation

Image to image translation

Text to image generation

Inpainting

Blending

Transforming attributes

Creating training data

Creating new animation characters

3D models from photos

Neural artistic style transfer

Content loss

Style loss using the Gram matrix

Style transfer

Generative Adversarial Networks

Vanilla GAN

Conditional GAN

Adversarial loss

Image translation

InfoGAN

Drawbacks of GAN

Visual dialogue model

Algorithm for VDM

Generator

Discriminator

Summary

Chapter 9: Video Classification

Understanding and classifying videos

Exploring video classification datasets

UCF101

YouTube-8M

Other datasets

Splitting videos into frames

Approaches for classifying videos

Fusing parallel CNN for video classification

Classifying videos over long periods

Streaming two CNN's for action recognition

Using 3D convolution for temporal learning

Using trajectory for classification

Multi-modal fusion

Attending regions for classification

Extending image-based approaches to videos

Regressing the human pose

Tracking facial landmarks

Segmenting videos

Captioning videos

Generating videos

Summary

Chapter 10: Deployment

Performance of models

Quantizing the models

MobileNets

Deployment in the cloud

AWS

Google Cloud Platform

Deployment of models in devices

Jetson TX2

Android

iPhone

Summary

Other Books You May Enjoy

Index

The users who browse this book also browse