图书介绍
大数据分析 R语言实现PDF|Epub|txt|kindle电子书版本网盘下载
- (英)西蒙?沃克威克 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564173616
- 出版时间:2017
- 标注页数:490页
- 文件大小:61MB
- 文件页数:503页
- 主题词:程序语言-程序设计-英文
PDF下载
下载说明
大数据分析 R语言实现PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Preface1
Chapter 1:The Era of Big Data7
Big Data-The monster re-defined7
Big Data toolbox-dealing with the giant11
Hadoop-the elephant in the room12
Databases15
Hadoop Spark-ed up16
R-The unsung Big Data hero17
Summary24
Chapter 2:Introduction to R Programming Language and Statistical Environment25
Learning R25
Revisiting R basics28
Getting R and RStudio ready28
Setting the URLs to R repositories30
R data structures32
Vectors32
Scalars35
Matrices35
Arrays37
Data frames38
Lists41
Exporting R data objects42
Applied data science with R47
Importing data from different formats48
Exploratory Data Analysis50
Data aggregations and contingency tables53
Hypothesis testing and statistical inference56
Tests of differences57
Independent t-test example(with power and effect size estimates)57
ANOVA example60
Tests of relationships63
An example of Pearson's r correlations63
Multiple regression example65
Data visualization packages70
Summary71
Chapter 3:Unleashing the Power of R from Within73
Traditional limitations of R74
Out-of-memory data74
Processing speed75
To the memory limits and beyond76
Data transformations and aggregations with the ff and ffbase packages76
Generalized linear models with the ff and ffbase packages87
Logistic regression example with ffbase and biglm89
Expanding memory with the bigmemory package97
Parallel R106
From bigmemory to faster computations107
An apply()example with the big.matrix object108
A for()loop example with the ffdf object108
Using apply()and for()loop examples on a data.frame109
A parallel package example110
A foreach package example113
The future of parallel processing in R115
Utilizing Graphics Processing Units with R115
Multi-threading with Microsoft R Open distribution117
Parallel machine learning with H2O and R118
Boosting R performance with the data.table package and other tools118
Fast data import and manipulation with the data.table package118
Data import with data.table119
Lightning-fast subsets and aggregations on data.table120
Chaining,more complex aggregations,and pivot tables with data.table123
Writing better R code126
Summary127
Chapter 4:Hadoop and MapReduce Framework for R129
Hadoop architecture130
Hadoop Distributed File System130
MapReduce framework131
A simple MapReduce word count example132
Other Hadoop native tools134
Learning Hadoop136
A single-node Hadoop in Cloud137
Deploying Hortonworks Sandbox on Azure138
A word count example in Hadoop using Java159
A word count example in Hadoop using the R language169
RStudio Server on a Linux RedHat/CentOS virtual machine169
Installing and configuring RHadoop packages177
HDFS management and MapReduce in R-a word count example179
HDInsight-a multi-node Hadoop cluster on Azure194
Creating your first HDInsight cluster194
Creating a new Resource Group195
Deploying a Virtual Network197
Creating a Network Security Group200
Setting up and configuring an HDInsight cluster203
Starting the cluster and exploring Ambari211
Connecting to the HDInsight cluster and installing RStudio Server215
Adding a new inbound security rule for port 8787218
Editing the Virtual Network's public IP address for the head node221
Smart energy meter readings analysis example-using R on HDInsight cluster229
Summary241
Chapter 5:R with Relational Database Management Systems(RDBMSs)243
Relational Database Management Systems(RDBMSs)244
A short overview of used RDBMSs244
Structured Query Language(SQL)245
SQLite with R247
Preparing and importing data into a local SQLite database248
Connecting to SQLite from RStudio250
MariaDB with R on a Amazon EC2 instance255
Preparing the EC2 instance and RStudio Server for use255
Preparing MariaDB and data for use257
Working with MariaDB from RStudio266
PostgreSQL with R on Amazon RDS281
Launching an Amazon RDS database instance281
Preparing and uploading data to Amazon RDS290
Remotely querying PostgreSQL on Amazon RDS from RStudio304
Summary314
Chapter 6:R with Non-Relational(NoSQL)Databases315
Introduction to NoSQL databases315
Review of leading non-relational databases316
MongoDB with R319
Introduction to MongoDB319
MongoDB data models319
Installing MongoDB with R on Amazon EC2322
Processing Big Data using MongoDB with R325
Importing data into MongoDB and basic MongoDB commands326
MongoDB with R using the rmongodb package333
MongoDB with R using the RMongo package346
MongoDB with R using the mongolite package350
HBase with R355
Azure HDInsight with HBase and RStudio Server355
Importing the data to HDFS and HBase363
Reading and querying HBase using the rhbase package367
Summary372
Chapter 7:Faster than Hadoop-Spark with R373
Spark for Big Data analytics374
Spark with R on a multi-node HDInsight cluster375
Launching HDInsight with Spark and R/RStudio375
Reading the data into HDFS and Hive383
Getting the data into HDFS385
Importing data from HDFS to Hive386
Bay Area Bike Share analysis using SparkR393
Summary411
Chapter 8:Machine Learning Methods for Big Data in R413
What is machine learning?414
Supervised and unsupervised machine learning methods415
Classification and clustering algorithms416
Machine learning methods with R417
Big Data machine learning tools418
GLM example with Spark and R on the HDInsight cluster419
Preparing the Spark cluster and reading the data from HDFS419
Logistic regression in Spark with R425
Naive Bayes with H2O on Hadoop with R437
Running an H2O instance on Hadoop with R437
Reading and exploring the data in H2O441
Naive Bayes on H2O with R446
Neural Networks with H2O on Hadoop with R458
How do Neural Networks work?458
Running Deep Learning models on H2O461
Summary469
Chapter 9:The Future of R-Big,Fast,and Smart Data471
The current state of Big Data analytics with R471
Out-of-memory data on a single machine471
Faster data processing with R473
Hadoop with R475
Spark with R476
R with databases477
Machine learning with R478
The future of R478
Big Data479
Fast data480
Smart data481
Where to go next482
Summary482
Index483