R: Recipes for Analysis, Visualization and Machine Learning

Author: Viswa Viswanathan;Shanthi Viswanathan;Atmajitsinh Gohil;Yu-Wei> Chiu (David Chiu)  

Publisher: Packt Publishing‎

Publication year: 2016

E-ISBN: 9781787288799

P-ISBN(Paperback): 9781787289598

Subject: TN919.5 数据处理系统及设备;TP274 数据处理、数据处理系统;TP39 computer application

Keyword: 计算机的应用,自动化技术、计算机技术,数据处理、数据处理系统,数据处理系统及设备

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

R: Recipes for Analysis, Visualization and Machine Learning

Description

Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning About This Book • Proficiently analyze data and apply machine learning techniques • Generate visualizations, develop interactive visualizations and applications to understand various data exploratory functions in R • Construct a predictive model by using a variety of machine learning packages Who This Book Is For This Learning Path is ideal for those who have been exposed to R, but have not used it extensively yet. It covers the basics of using R and is written for new and intermediate R users interested in learning. This Learning Path also provides in-depth insights into professional techniques for analysis, visualization, and machine learning with R – it will help you increase your R expertise, regardless of your level of experience. What You Will Learn • Get data into your R environment and prepare it for analysis • Perform exploratory data analyses and generate meaningful visualizations of the data • Generate various plots in R using the basic R plotting techniques • Create presentations and learn the basics of creating apps in R for your audience • Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm • Visualize associations in various graph formats and find frequent itemset using the ECLAT algorithm • Build, tune, and evaluate predictive models with different machine learning packages • Incorporate R and Hadoop t

Chapter

Module 1

Chapter 1: A Simple Guide to R

Installing packages and getting help in R

Data types in R

Special values in R

Matrices in R

Editing a matrix in R

Data frames in R

Editing a data frame in R

Importing data in R

Exporting data in R

Writing a function in R

Writing if else statements in R

Basic loops in R

Nested loops in R

The apply, lapply, sapply, and tapply functions

Using par to beautify a plot in R

Saving plots

Chapter 2: Practical Machine Learning with R

Introduction

Downloading and installing R

Downloading and installing RStudio

Installing and loading packages

Reading and writing data

Using R to manipulate data

Applying basic statistics

Visualizing data

Getting a dataset for machine learning

Chapter 3: Acquire and Prepare the Ingredients – Your Data

Introduction

Reading data from CSV files

Reading XML data

Reading JSON data

Reading data from fixed-width formatted files

Reading data from R files and R libraries

Removing cases with missing values

Replacing missing values with the mean

Removing duplicate cases

Rescaling a variable to [0,1]

Normalizing or standardizing data in a data frame

Binning numerical data

Creating dummies for categorical variables

Chapter 4: What's in There? – Exploratory Data Analysis

Introduction

Creating standard data summaries

Extracting a subset of a dataset

Splitting a dataset

Creating random data partitions

Generating standard plots such as histograms, boxplots, and scatterplots

Generating multiple plots on a grid

Selecting a graphics device

Creating plots with the lattice package

Creating plots with the ggplot2 package

Creating charts that facilitate comparisons

Creating charts that help visualize a possible causality

Creating multivariate plots

Chapter 5: Where Does It Belong? – Classification

Introduction

Generating error/classification-confusion matrices

Generating ROC charts

Building, plotting, and evaluating – classification trees

Using random forest models for classification

Classifying using Support Vector Machine

Classifying using the Naïve Bayes approach

Classifying using the KNN approach

Using neural networks for classification

Classifying using linear discriminant function analysis

Classifying using logistic regression

Using AdaBoost to combine classification tree models

Chapter 6: Give Me a Number – Regression

Introduction

Computing the root mean squared error

Building KNN models for regression

Performing linear regression

Performing variable selection in linear regression

Building regression trees

Building random forest models for regression

Using neural networks for regression

Performing k-fold cross-validation

Performing leave-one-out-cross-validation to limit overfitting

Chapter 7: Can You Simplify That? – Data Reduction Techniques

Introduction

Performing cluster analysis using K-means clustering

Performing cluster analysis using hierarchical clustering

Reducing dimensionality with principal component analysis

Chapter 8: Lessons from History – Time Series Analysis

Introduction

Creating and examining date objects

Operating on date objects

Performing preliminary analyses on time series data

Using time series objects

Decomposing time series

Filtering time series data

Smoothing and forecasting using the Holt-Winters method

Building an automated ARIMA model

Chapter 9: It's All About Your Connections – Social Network Analysis

Introduction

Downloading social network data using public APIs

Creating adjacency matrices and edge lists

Plotting social network data

Computing important network metrics

Chapter 10: Put Your Best Foot Forward – Document and Present Your Analysis

Introduction

Generating reports of your data analysis with R Markdown and knitr

Creating interactive web applications with shiny

Creating PDF presentations of your analysis with R Presentation

Chapter 11: Work Smarter, Not Harder – Efficient and Elegant R Code

Introduction

Exploiting vectorized operations

Processing entire rows or columns using the apply function

Applying a function to all elements of a collection with lapply and sapply

Applying functions to subsets of a vector

Using the split-apply-combine strategy with plyr

Slicing, dicing, and combining data with data tables

Chapter 12: Where in the World? – Geospatial Analysis

Introduction

Downloading and plotting a Google map of an area

Overlaying data on the downloaded Google map

Importing ESRI shape files into R

Using the sp package to plot geographic data

Getting maps from the maps package

Creating spatial data frames from regular data frames containing spatial and other data

Creating spatial data frames by combining regular data frames with spatial objects

Adding variables to an existing spatial data frame

Chapter 13: Playing Nice – Connecting to Other Systems

Introduction

Using Java objects in R

Using JRI to call R functions from Java

Using Rserve to call R functions from Java

Executing R scripts from Java

Using the xlsx package to connect to Excel

Reading data from relational databases – MySQL

Reading data from NoSQL databases – MongoDB

Module 2

Chapter 1: Basic and Interactive Plots

Introduction

Introducing a scatter plot

Scatter plots with texts, labels, and lines

Connecting points in a scatter plot

Generating an interactive scatter plot

A simple bar plot

An interactive bar plot

A simple line plot

Line plot to tell an effective story

Generating an interactive Gantt/timeline chart in R

Merging histograms

Making an interactive bubble plot

Constructing a waterfall plot in R

Chapter 2: Heat Maps and Dendrograms

Introduction

Constructing a simple dendrogram

Creating dendrograms with colors and labels

Creating a heat map

Generating a heat map with customized colors

Generating an integrated dendrogram and a heat map

Creating a three-dimensional heat map and a stereo map

Constructing a tree map in R

Chapter 3: Maps

Introduction

Introducing regional maps

Introducing choropleth maps

A guide to contour maps

Constructing maps with bubbles

Integrating text with maps

Introducing shapefiles

Creating cartograms

Chapter 4: The Pie Chart and Its Alternatives

Introduction

Generating a simple pie chart

Constructing pie charts with labels

Creating donut plots and interactive plots

Generating a slope chart

Constructing a fan plot

Chapter 5: Adding the Third Dimension

Introduction

Constructing a 3D scatter plot

Generating a 3D scatter plot with text

A simple 3D pie chart

A simple 3D histogram

Generating a 3D contour plot

Integrating a 3D contour and a surface plot

Animating a 3D surface plot

Chapter 6: Data in Higher Dimensions

Introduction

Constructing a sunflower plot

Creating a hexbin plot

Generating interactive calendar maps

Creating Chernoff faces in R

Constructing a coxcomb plot in R

Constructing network plots

Constructing a radial plot

Generating a very basic pyramid plot

Chapter 7: Visualizing Continuous Data

Introduction

Generating a candlestick plot

Generating interactive candlestick plots

Generating a decomposed time series

Plotting a regression line

Constructing a box and whiskers plot

Generating a violin plot

Generating a quantile-quantile plot (QQ plot)

Generating a density plot

Generating a simple correlation plot

Chapter 8: Visualizing Text and XKCD-style Plots

Introduction

Generating a word cloud

Constructing a word cloud from a document

Generating a comparison cloud

Constructing a correlation plot and a phrase tree

Generating plots with custom fonts

Generating an XKCD-style plot

Chapter 9: Creating Applications in R

Introduction

Creating animated plots in R

Creating a presentation in R

A basic introduction to API and XML

Constructing a bar plot using XML in R

Creating a very simple shiny app in R

Module 3

Chapter 1: Data Exploration with RMS Titanic

Introduction

Reading a Titanic dataset from a CSV file

Converting types on character variables

Detecting missing values

Imputing missing values

Exploring and visualizing data

Predicting passenger survival with a decision tree

Validating the power of prediction with a confusion matrix

Assessing performance with the ROC curve

Chapter 2: R and Statistics

Introduction

Understanding data sampling in R

Operating a probability distribution in R

Working with univariate descriptive statistics in R

Performing correlations and multivariate analysis

Operating linear regression and multivariate analysis

Conducting an exact binomial test

Performing student's t-test

Performing the Kolmogorov-Smirnov test

Understanding the Wilcoxon Rank Sum and Signed Rank test

Working with Pearson's Chi-squared test

Conducting a one-way ANOVA

Performing a two-way ANOVA

Chapter 3: Understanding Regression Analysis

Introduction

Fitting a linear regression model with lm

Summarizing linear model fits

Using linear regression to predict unknown values

Generating a diagnostic plot of a fitted model

Fitting a polynomial regression model with lm

Fitting a robust linear regression model with rlm

Studying a case of linear regression on SLID data

Applying the Gaussian model for generalized linear regression

Applying the Poisson model for generalized linear regression

Applying the Binomial model for generalized linear regression

Fitting a generalized additive model to data

Visualizing a generalized additive model

Diagnosing a generalized additive model

Chapter 4: Classification (I) – Tree, Lazy, and Probabilistic

Introduction

Preparing the training and testing datasets

Building a classification model with recursive partitioning trees

Visualizing a recursive partitioning tree

Measuring the prediction performance of a recursive partitioning tree

Pruning a recursive partitioning tree

Building a classification model with a conditional inference tree

Visualizing a conditional inference tree

Measuring the prediction performance of a conditional inference tree

Classifying data with the k-nearest neighbor classifier

Classifying data with logistic regression

Classifying data with the Naïve Bayes classifier

Chapter 5: Classification (II) – Neural Network and SVM

Introduction

Classifying data with a support vector machine

Choosing the cost of a support vector machine

Visualizing an SVM fit

Predicting labels based on a model trained by a support vector machine

Tuning a support vector machine

Training a neural network with neuralnet

Visualizing a neural network trained by neuralnet

Predicting labels based on a model trained by neuralnet

Training a neural network with nnet

Predicting labels based on a model trained by nnet

Chapter 6: Model Evaluation

Introduction

Estimating model performance with k-fold cross-validation

Performing cross-validation with the e1071 package

Performing cross-validation with the caret package

Ranking the variable importance with the caret package

Ranking the variable importance with the rminer package

Finding highly correlated features with the caret package

Selecting features using the caret package

Measuring the performance of the regression model

Measuring prediction performance with a confusion matrix

Measuring prediction performance using ROCR

Comparing an ROC curve using the caret package

Measuring performance differences between models with the caret package

Chapter 7: Ensemble Learning

Introduction

Classifying data with the bagging method

Performing cross-validation with the bagging method

Classifying data with the boosting method

Performing cross-validation with the boosting method

Classifying data with gradient boosting

Calculating the margins of a classifier

Calculating the error evolution of the ensemble method

Classifying data with random forest

Estimating the prediction errors of different classifiers

Chapter 8: Clustering

Introduction

Clustering data with hierarchical clustering

Cutting trees into clusters

Clustering data with the k-means method

Drawing a bivariate cluster plot

Comparing clustering methods

Extracting silhouette information from clustering

Obtaining the optimum number of clusters for k-means

Clustering data with the density-based method

Clustering data with the model-based method

Visualizing a dissimilarity matrix

Validating clusters externally

Chapter 9: Association Analysis and Sequence Mining

Introduction

Transforming data into transactions

Displaying transactions and associations

Mining associations with the Apriori rule

Pruning redundant rules

Visualizing association rules

Mining frequent itemsets with Eclat

Creating transactions with temporal information

Mining frequent sequential patterns with cSPADE

Chapter 10: Dimension Reduction

Introduction

Performing feature selection with FSelector

Performing dimension reduction with PCA

Determining the number of principal components using the scree test

Determining the number of principal components using the Kaiser method

Visualizing multivariate data using biplot

Performing dimension reduction with MDS

Reducing dimensions with SVD

Compressing images with SVD

Performing nonlinear dimension reduction with ISOMAP

Performing nonlinear dimension reduction with Local Linear Embedding

Chapter 11: Big Data Analysis (R and Hadoop)

Introduction

Preparing the RHadoop environment

Installing rmr2

Installing rhdfs

Operating HDFS with rhdfs

Implementing a word count problem with RHadoop

Comparing the performance between an R MapReduce program and a standard R program

Testing and debugging the rmr2 program

Installing plyrmr

Manipulating data with plyrmr

Conducting machine learning with RHadoop

Configuring RHadoop clusters on Amazon EMR

Appendix A: Resources for R and Machine Learning

Appendix B: Dataset – Survival of Passengers on the Titanic

Bibliography

The users who browse this book also browse


No browse record.