Hands-On Data Science with Anaconda

Author: Yuxing Yan   James Yan  

Publisher: Packt Publishing‎

Publication year: 2018

E-ISBN: 9781788834735

P-ISBN(Paperback): 89543100739250

Subject: TP274 数据处理、数据处理系统

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Hands-On Data Science with Anaconda


Chapter 1: Ecosystem of Anaconda


Reasons for using Jupyter via Anaconda

Using Jupyter without pre-installation


Anaconda Cloud

Finding help


Review questions and exercises

Chapter 2: Anaconda Installation

Installing Anaconda

Anaconda for Windows

Testing Python

Using IPython

Using Python via Jupyter

Introducing Spyder

Installing R via Conda

Installing Julia and linking it to Jupyter

Installing Octave and linking it to Jupyter

Finding help


Review questions and exercises

Chapter 3: Data Basics

Sources of data

UCI machine learning

Introduction to the Python pandas package

Several ways to input data

Inputting data using R

Inputting data using Python

Introduction to the Quandl data delivery platform

Dealing with missing data

Data sorting

Slicing and dicing datasets

Merging different datasets

Data output

Introduction to the cbsodata Python package

Introduction to the datadotworld Python package

Introduction to the haven and foreign R packages

Introduction to the dslabs R package

Generating Python datasets

Generating R datasets


Review questions and exercises

Chapter 4: Data Visualization

Importance of data visualization

Data visualization in R

Data visualization in Python

Data visualization in Julia

Drawing simple graphs

Various bar charts, pie charts, and histograms

Adding a trend

Adding legends and other explanations

Visualization packages for R

Visualization packages for Python

Visualization packages for Julia

Dynamic visualization

Saving pictures as pdf

Saving dynamic visualization as HTML file


Review questions and exercises

Chapter 5: Statistical Modeling in Anaconda

Introduction to linear models

Running a linear regression in R, Python, Julia, and Octave

Critical value and the decision rule

F-test, critical value, and the decision rule

An application of a linear regression in finance

Dealing with missing data

Removing missing data

Replacing missing data with another value

Detecting outliers and treatments

Several multivariate linear models

Collinearity and its solution

A model's performance measure


Review questions and exercises

Chapter 6: Managing Packages

Introduction to packages, modules, or toolboxes

Two examples of using packages

Finding all R packages

Finding all Python packages

Finding all Julia packages

Finding all Octave packages

Task views for R

Finding manuals

Package dependencies

Package management in R

Package management in Python

Package management in Julia

Package management in Octave

Conda – the package manager

Creating a set of programs in R and Python

Finding environmental variables


Review questions and exercises

Chapter 7: Optimization in Anaconda

Why optimization is important

General issues for optimization problems

Expressing various kinds of optimization problems as LPP

Quadratic optimization

Optimization in R

Optimization in Python

Optimization in Julia

Optimization in Octave

Example #1 – stock portfolio optimization

Example #2 – optimal tax policy

Packages for optimization in R

Packages for optimization in Python

Packages for optimization in Octave

Packages for optimization in Julia


Review questions and exercises

Chapter 8: Unsupervised Learning in Anaconda

Introduction to unsupervised learning

Hierarchical clustering

k-means clustering

Introduction to Python packages – scipy

Introduction to Python packages – contrastive

Introduction to Python packages – sklearn (scikit-learn)

Introduction to R packages – rattle

Introduction to R packages – randomUniformForest

Introduction to R packages – Rmixmod

Implementation using Julia

Task view for Cluster Analysis


Review questions and exercises

Chapter 9: Supervised Learning in Anaconda

A glance at supervised learning


The k-nearest neighbors algorithm

Bayes classifiers

Reinforcement learning

Implementation of supervised learning via R

Introduction to RTextTools

Implementation via Python

Using the scikit-learn (sklearn) module

Implementation via Octave

Implementation via Julia

Task view for machine learning in R


Review questions and exercises

Chapter 10: Predictive Data Analytics – Modeling and Validation

Understanding predictive data analytics

Useful datasets

The AppliedPredictiveModeling R package

Time series analytics

Predicting future events


Visualizing components

R package – LiblineaR

R package – datarobot

R package – eclust

Model selection

Python package – model-catwalk

Python package – sklearn

Julia package – QuantEcon

Octave package – ltfat

Granger causality test


Review questions and exercises

Chapter 11: Anaconda Cloud

Introduction to Anaconda Cloud

Jupyter Notebook in depth

Formats of Jupyter Notebook

Sharing of notebooks

Sharing of projects

Sharing of environments

Replicating others' environments locally

Downloading a package from Anaconda


Review questions and exercises

Chapter 12: Distributed Computing, Parallel Computing, and HPCC

Introduction to distributed versus parallel computing

Task view for parallel processing

Sample programs in Python

Understanding MPI

R package Rmpi

R package plyr

R package parallel

R package snow

Parallel processing in Python

Parallel processing for word frequency

Parallel Monte-Carlo options pricing

Compute nodes

Anaconda add-on

Introduction to HPCC


Review questions and exercises


Chapter 01: Ecosystem of Anaconda

Chapter 02: Anaconda Installation

Chapter 03: Data Basics

Chapter 04: Data Visualization

Chapter 05: Statistical Modeling in Anaconda

Chapter 06: Managing Packages

Chapter 07: Optimization in Anaconda

Chapter 08: Unsupervised Learning in Anaconda

Chapter 09: Supervised Learning in Anaconda

Chapter 10: Predictive Data Analytics – Modelling and Validation

Chapter 11: Anaconda Cloud

Chapter 12: Distributed Computing, Parallel Computing, and HPCC

Other Books You May Enjoy
