What is dplyr?

dplyr is a powerful R-package to transform and summarize tabular data with rows and columns.

(dplyr是一个功能强大的包用来变换和总结表格数据的行和列)

Why is it useful?

The package contains a set of functions (or “verbs”) that perform common data manipulation operations such as filtering for rows, selecting specific columns, re-ordering rows, adding new columns and summarizing data.

(这个包包含一组函数执行常见的数据操作，比如说筛选行选择特定的列、重新排序行、添加新的列和总结数据)

How does it compare to using base functions R?

If you are familiar with R, you are probably familiar with base R functions such as split(), subset(), apply(), sapply(), lapply(), tapply() and aggregate(). Compared to base functions in R, the functions in dplyr are easier to work with, are more consistent in the syntax and are targeted for data analysis around data frames instead of just vectors.

(和基本的R函数相比，dplyr包里的函数使用起来更简单，它有更一致的语法，并且针对data frame进行数据操作而不仅仅针对向量)

一、为什么学dplyr

dplyr使以下步骤变得简单：

1、通过限制你的选择，它简化了你对常见数据处理的思考与操作

2、它提供了最常见的数据处理的函数

3、它采用高效的数据存储后端，处理数据的效率高

二、data frame tbl

dplyr处理的数据形式都是data.frame，但如果你正在处理大量的数据，那么就有必要将它们转换成data frame tbl,其主要优势是data frame tbl只打印适合在一个屏幕上几行和所有列，用文本描述其余的部分。

两个主要的功能：1、只打印前10行和适合屏幕的列 2、总是返回data.frame

用法：

tbl_df(data)

参数：

data：data.frame数据

Examples:

mtcars

ds <- tbl_df(mtcars)
ds

生信必备R包之dplyr(一)

一、为什么学dplyr

二、data frame tbl

相关推荐

idea本地配置连接远程hadoop集群的一些网络问题解决汇总

无缓存不行?例行升级的入门级阿斯加特AN2 SSD装机点评

Ceph运维手册(基于P版本)

大数据开发前要做什么准备?8台Hadoop服务器进行集群规划前配置

Tensorflow分类loss函数总结 tensorflow绘制loss曲线

R语言学习笔记(七) -离散型数据的模型预测2

iOS Runtime详解

PHP 远程调试最佳实践

7 个对 Java 意义重大的性能指标，你知道几个?

Laravel框架使用图片处理简单教程