Chapter
Chapter 1: A Simple Guide to R
Installing packages and getting help in R
Editing a data frame in R
Writing if else statements in R
The apply, lapply, sapply, and tapply functions
Using par to beautify a plot in R
Chapter 2: Practical Machine Learning with R
Downloading and installing R
Downloading and installing RStudio
Installing and loading packages
Using R to manipulate data
Applying basic statistics
Getting a dataset for machine learning
Chapter 3: Acquire and Prepare the Ingredients –
Your Data
Reading data from CSV files
Reading data from fixed-width formatted files
Reading data from R files and R libraries
Removing cases with missing values
Replacing missing values with the mean
Rescaling a variable to [0,1]
Normalizing or standardizing data in a
data frame
Creating dummies for categorical variables
Chapter 4: What's in There? – Exploratory Data Analysis
Creating standard data summaries
Extracting a subset of a dataset
Creating random data partitions
Generating standard plots such as histograms, boxplots, and scatterplots
Generating multiple plots on a grid
Selecting a graphics device
Creating plots with the lattice package
Creating plots with the ggplot2 package
Creating charts that facilitate comparisons
Creating charts that help visualize a possible causality
Creating multivariate plots
Chapter 5: Where Does It Belong? – Classification
Generating error/classification-confusion matrices
Building, plotting, and evaluating – classification trees
Using random forest models for classification
Classifying using Support Vector Machine
Classifying using the Naïve Bayes approach
Classifying using the KNN approach
Using neural networks for classification
Classifying using linear discriminant function analysis
Classifying using logistic regression
Using AdaBoost to combine classification tree models
Chapter 6: Give Me a Number – Regression
Computing the root mean squared error
Building KNN models for regression
Performing linear regression
Performing variable selection in linear regression
Building regression trees
Building random forest models for regression
Using neural networks for regression
Performing k-fold cross-validation
Performing leave-one-out-cross-validation to limit overfitting
Chapter 7: Can You Simplify That? – Data Reduction Techniques
Performing cluster analysis using K-means clustering
Performing cluster analysis using hierarchical clustering
Reducing dimensionality with principal component analysis
Chapter 8: Lessons from History – Time Series Analysis
Creating and examining date objects
Operating on date objects
Performing preliminary analyses on time series data
Using time series objects
Filtering time series data
Smoothing and forecasting using the
Holt-Winters method
Building an automated ARIMA model
Chapter 9: It's All About Your Connections – Social Network Analysis
Downloading social network data using public APIs
Creating adjacency matrices and edge lists
Plotting social network data
Computing important network metrics
Chapter 10: Put Your Best Foot Forward – Document and Present Your Analysis
Generating reports of your data analysis with R Markdown and knitr
Creating interactive web applications
with shiny
Creating PDF presentations of your analysis with R Presentation
Chapter 11: Work Smarter,
Not Harder – Efficient and Elegant R Code
Exploiting vectorized operations
Processing entire rows or columns using the apply function
Applying a function to all elements of a collection with lapply and sapply
Applying functions to subsets of a vector
Using the split-apply-combine strategy
with plyr
Slicing, dicing, and combining data with data tables
Chapter 12: Where in the World? – Geospatial Analysis
Downloading and plotting a Google map of an area
Overlaying data on the downloaded
Google map
Importing ESRI shape files into R
Using the sp package to plot geographic data
Getting maps from the maps package
Creating spatial data frames from regular
data frames containing spatial and other data
Creating spatial data frames by combining regular data frames with spatial objects
Adding variables to an existing spatial
data frame
Chapter 13: Playing Nice – Connecting to Other Systems
Using JRI to call R functions from Java
Using Rserve to call R functions from Java
Executing R scripts from Java
Using the xlsx package to connect to Excel
Reading data from relational databases – MySQL
Reading data from NoSQL
databases – MongoDB
Chapter 1: Basic and Interactive Plots
Introducing a scatter plot
Scatter plots with texts, labels, and lines
Connecting points in a scatter plot
Generating an interactive scatter plot
Line plot to tell an effective story
Generating an interactive Gantt/timeline chart in R
Making an interactive bubble plot
Constructing a waterfall plot in R
Chapter 2: Heat Maps and Dendrograms
Constructing a simple dendrogram
Creating dendrograms with colors and labels
Generating a heat map with customized colors
Generating an integrated dendrogram and a heat map
Creating a three-dimensional heat map and a stereo map
Constructing a tree map in R
Introducing regional maps
Introducing choropleth maps
Constructing maps with bubbles
Integrating text with maps
Chapter 4: The Pie Chart and
Its Alternatives
Generating a simple pie chart
Constructing pie charts with labels
Creating donut plots and interactive plots
Chapter 5: Adding the Third Dimension
Constructing a 3D scatter plot
Generating a 3D scatter plot with text
Generating a 3D contour plot
Integrating a 3D contour and a surface plot
Animating a 3D surface plot
Chapter 6: Data in Higher Dimensions
Constructing a sunflower plot
Generating interactive calendar maps
Creating Chernoff faces in R
Constructing a coxcomb plot in R
Constructing network plots
Constructing a radial plot
Generating a very basic pyramid plot
Chapter 7: Visualizing Continuous Data
Generating a candlestick plot
Generating interactive candlestick plots
Generating a decomposed time series
Plotting a regression line
Constructing a box and whiskers plot
Generating a quantile-quantile plot (QQ plot)
Generating a density plot
Generating a simple correlation plot
Chapter 8: Visualizing Text and XKCD-style Plots
Constructing a word cloud from a document
Generating a comparison cloud
Constructing a correlation plot and a
phrase tree
Generating plots with custom fonts
Generating an XKCD-style plot
Chapter 9: Creating Applications
in R
Creating animated plots in R
Creating a presentation in R
A basic introduction to API and XML
Constructing a bar plot using XML in R
Creating a very simple shiny app in R
Chapter 1: Data Exploration with
RMS Titanic
Reading a Titanic dataset from a CSV file
Converting types on character variables
Exploring and visualizing data
Predicting passenger survival with a decision tree
Validating the power of prediction with a confusion matrix
Assessing performance with the ROC curve
Chapter 2: R and Statistics
Understanding data sampling in R
Operating a probability distribution in R
Working with univariate descriptive statistics in R
Performing correlations and multivariate analysis
Operating linear regression and multivariate analysis
Conducting an exact binomial test
Performing student's t-test
Performing the Kolmogorov-Smirnov test
Understanding the Wilcoxon Rank Sum and Signed Rank test
Working with Pearson's Chi-squared test
Conducting a one-way ANOVA
Performing a two-way ANOVA
Chapter 3: Understanding Regression Analysis
Fitting a linear regression model with lm
Summarizing linear model fits
Using linear regression to predict unknown values
Generating a diagnostic plot of a fitted model
Fitting a polynomial regression model with lm
Fitting a robust linear regression model with rlm
Studying a case of linear regression on SLID data
Applying the Gaussian model for generalized linear regression
Applying the Poisson model for generalized linear regression
Applying the Binomial model for generalized linear regression
Fitting a generalized additive model to data
Visualizing a generalized additive model
Diagnosing a generalized additive model
Chapter 4: Classification (I) – Tree, Lazy, and Probabilistic
Preparing the training and testing datasets
Building a classification model with recursive partitioning trees
Visualizing a recursive partitioning tree
Measuring the prediction performance of a recursive partitioning tree
Pruning a recursive partitioning tree
Building a classification model with a conditional inference tree
Visualizing a conditional inference tree
Measuring the prediction performance of a conditional inference tree
Classifying data with the k-nearest neighbor classifier
Classifying data with logistic regression
Classifying data with the Naïve Bayes classifier
Chapter 5: Classification (II) – Neural Network
and SVM
Classifying data with a support vector machine
Choosing the cost of a support vector machine
Predicting labels based on a model trained by a support vector machine
Tuning a support vector machine
Training a neural network with neuralnet
Visualizing a neural network trained by neuralnet
Predicting labels based on a model trained by neuralnet
Training a neural network with nnet
Predicting labels based on a model trained by nnet
Chapter 6: Model Evaluation
Estimating model performance with k-fold cross-validation
Performing cross-validation with the
e1071 package
Performing cross-validation with the
caret package
Ranking the variable importance with the caret package
Ranking the variable importance with the rminer package
Finding highly correlated features with the caret package
Selecting features using the caret package
Measuring the performance of the regression model
Measuring prediction performance with a confusion matrix
Measuring prediction performance using ROCR
Comparing an ROC curve using the
caret package
Measuring performance differences between models with the caret package
Chapter 7: Ensemble Learning
Classifying data with the bagging method
Performing cross-validation with the bagging method
Classifying data with the boosting method
Performing cross-validation with the boosting method
Classifying data with gradient boosting
Calculating the margins of a classifier
Calculating the error evolution of the ensemble method
Classifying data with random forest
Estimating the prediction errors of different classifiers
Clustering data with hierarchical clustering
Cutting trees into clusters
Clustering data with the k-means method
Drawing a bivariate cluster plot
Comparing clustering methods
Extracting silhouette information from clustering
Obtaining the optimum number of clusters for k-means
Clustering data with the density-based method
Clustering data with the model-based method
Visualizing a dissimilarity matrix
Validating clusters externally
Chapter 9: Association Analysis and Sequence Mining
Transforming data into transactions
Displaying transactions and associations
Mining associations with the Apriori rule
Visualizing association rules
Mining frequent itemsets with Eclat
Creating transactions with temporal information
Mining frequent sequential patterns
with cSPADE
Chapter 10: Dimension Reduction
Performing feature selection with FSelector
Performing dimension reduction with PCA
Determining the number of principal components using the scree test
Determining the number of principal components using the Kaiser method
Visualizing multivariate data using biplot
Performing dimension reduction with MDS
Reducing dimensions with SVD
Compressing images with SVD
Performing nonlinear dimension reduction with ISOMAP
Performing nonlinear dimension reduction with Local Linear Embedding
Chapter 11: Big Data Analysis
(R and Hadoop)
Preparing the RHadoop environment
Operating HDFS with rhdfs
Implementing a word count problem with RHadoop
Comparing the performance between an R MapReduce program and a standard R program
Testing and debugging the rmr2 program
Manipulating data with plyrmr
Conducting machine learning with RHadoop
Configuring RHadoop clusters on Amazon EMR
Appendix A: Resources for R and Machine Learning
Appendix B: Dataset – Survival
of Passengers on
the Titanic