apply系列函数笔记

前言

R语言中的for循环通常比较慢，一般情况下不采用for循环，而是采用其它的实现方式，即使用apply系列函数。

apply系列函数的构成

apply系列函数包括apply，sapply，lapply等。它们的使用范围如下所示：

apply()

apply函数的使用格式如下所示：

1	apply(X, MARGIN, FUN, ...)

其中X是一个矩阵或数组；MARGIN是1或2，其中1表示对行使用函数，2表示对列使用函数；FUN是函数，...是f的可选参数，现在看一下使用案例，如下所示：

x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
x
# x1 x2
# [1,]  3  4
# [2,]  3  3
# [3,]  3  2
# [4,]  3  1
# [5,]  3  2
# [6,]  3  3
# [7,]  3  4
# [8,]  3  5
dimnames(x)[[1]] <- letters[1:8]
x
# x1 x2
# a  3  4
# b  3  3
# c  3  2
# d  3  1
# e  3  2
# f  3  3
# g  3  4
# h  3  5
apply(x, 2, sd)
# 计算每1列的sd
# x1       x2 
# 0.000000 1.309307
apply(x, 2, sum)
# 计算每1列的和
# x1 x2 
# 24 24 
apply(x, 1, sum)
# 计算每一行的和
# a b c d e f g h 
# 7 6 5 4 5 6 7 8 
apply(x, 2, sort)
# 对列进行排序
# x1 x2
# [1,]  3  1
# [2,]  3  2
# [3,]  3  2
# [4,]  3  3
# [5,]  3  3
# [6,]  3  4
# [7,]  3  4
# [8,]  3  5

再看一个案例，在下面的案例中，我们会创建一个矩阵，然后使用apply来计算这个矩阵的行或列的值，如下所示：

counts <- matrix(c(3,2,4,6,5,1,8,6,1), ncol = 3)
colnames(counts) <- c('sparrow', 'dove', 'crow')
counts
apply(counts,2,max)
class(apply(counts,2,max))

计算结果如下所示：

> counts <- matrix(c(3,2,4,6,5,1,8,6,1), ncol = 3)
> colnames(counts) <- c('sparrow', 'dove', 'crow')
> counts
     sparrow dove crow
[1,]       3    6    8
[2,]       2    5    6
[3,]       4    1    1
> apply(counts,2,max)
sparrow    dove    crow 
      4       6       8 
> class(apply(counts,2,max))
[1] "numeric"

在这个案例中，我们创建了一个鸟类矩阵，第1列代表一个各类，第1行代表1天，我们计算了各类鸟在这3天中数量最多达到了多少，从结果中我们就可以发现，使用apply很容易就得到了想要的结果，这个结果中包含各个列中最大的数值，并以原列名作为向量值的名称，apply函数使用了到了3个参数，如下所示

函数作用的对象，这里就是矩阵counts；
函数将作用的维度或索引，其中数字1表示行，2表示列。因此我们在上面的案例中使用了列作为函数的作用对象，我们就使用了2，除此之外，我们还有可能输入更多的参数，后面会提到。
执行函数的名称：我们求的每个列的最大值，因此使用了函数max，这个max可以使用引号，即"max"，也可以不加引号，即max。

传入更多参数

假设第2天没有出现鸽子(Dove)，那么就需要把counts矩阵中的对应位置改为NA，如下所示：

1 2	counts[2,2] <- NA counts

计算结果如下所示：

> counts[2,2] <- NA
> counts
     sparrow dove crow
[1,]       3    6    8
[2,]       2   NA    6
[3,]       4    1    1

假设现在我们还使用max函数，如下所示：

1
2
3

> apply(counts,2,max)
sparrow    dove    crow 
      4      NA       8

此时，出现了NA这个结果，这不是我们想要结果，因此我们需要向apply()函数传入更多的参数，这里就是传入na.rm参数，如下所示：

1
2
3

> apply(counts,2,max, na.rm = TRUE)
sparrow    dove    crow 
      4       6       8

sapply()

1	sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

第一个参数X是一个向量；
第二个参数FUN是目标函数的名称；

sapply()案例1

switch()函数不能直接处理向量，但可以使用sapply()函数的特异将switch()作为目标函数来实现switch()的向量化，如下所示：

1
2
3

> sapply(c('a','b'),switch,a='Hello',b='Goodbye')
        a         b 
  "Hello" "Goodbye"

在这个案例中，c('a','b')是待处理的向量；switch是传入的目标函数，a='Hello',b='Goodbye'是传入switch函数的参数。

在这段代码中，swtich函数逐一对传入向量中的值'a'和'b'进行switch操作，并将后面的a='Hello'和b='Goodbye'作为参数传入switch()函数中。然后它将这两次调用的返回结果组成新的向量，并用传入的c('a','b')作为返回向量的值名称。

sapply()案例2

计算数据集iris的平均值，如下所示：

1 2	head(iris) sapply(iris, mean)

计算结果如下所示：

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> sapply(iris, mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
    5.843333     3.057333     3.758000     1.199333           NA 
Warning message:
In mean.default(X[[i]], ...) : 参数不是数值也不是逻辑值：回覆NA

R中出现了警告信息，这是因为Species不是一个数值列，因此可以编写一个函数放在apply()函数内部，检查参数的类型，如果是数值，返回均值，不是，返回NA，如下所示：

1
2
3

> sapply(iris, function(x) ifelse(is.numeric(x), mean(x), NA))
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
    5.843333     3.057333     3.758000     1.199333           NA

上面的function(x) ifelse(is.numeric(x), mean(x), NA)是匿名函数。

`switch()`函数

此处复习一下switch()函数，这个函数从字面上理解就是转换的意思，它的语法为：

1	switch(expr, list)

其中expr是表达式，其值为一个整数值或字符串，list是一个列表。

如果expr的计算结果为整数，并且其值为1-length(list)之间，则switch()函数返回列表相应位置的值，如果expr的值超出范围，没有返回值，看如下案例：

1
2
3

> x <- 2
> switch(2, "位置1","位置2","位置3")
[1] "位置2"

x的值为2，switch()根据传入的2，输出列表中的位置2。

如果参数是字符串，可以看下面的案例；

1	switch("c", a="位置1",b="位置2",c="位置3")

将函数应用于数据框

sapply()函数可以应用于列表和数据框，此时sapply()函数可以将列表中的每一个元素视为函数作用的对象，如下所示：

clients <- data.frame(
  hours = c(20, 110, 120, 23),
  type = c("private","public","abroad","other"),
  public = c(TRUE, TRUE, FALSE, FALSE)
)
str(clients)
sapply(clients, class)
class(sapply(clients, class))

计算结果如下所示：

> str(clients)
'data.frame':	4 obs. of  3 variables:
 $ hours : num  20 110 120 23
 $ type  : Factor w/ 4 levels "abroad","other",..: 3 4 1 2
 $ public: logi  TRUE TRUE FALSE FALSE
> sapply(clients, class)
    hours      type    public 
"numeric"  "factor" "logical" 
> class(sapply(clients, class))
[1] "character"

创建了一个数据框clients，clients中有3个变量，分别为hours，type，public，通过将class()函数传入sapply()，就查看了这3个变量的数据类型，其结果是一个向量形式。

`sapply()`结果的简化

sapply()函数的结果并非总是返回向量，它的标准返回类型是列表，但这个列表在必要的时候会被简化为矩阵或向量：

如果目标函数处理完列表或向量的各个元素后，返回的结果都是一个数字，那么sapply()会将其简化为一个向量；
如果目标函数处理完列表或向量的各个元素后，返回的向量都具有相同的长度，那么结果为简化为一个矩阵；
其他的情况返回列表。

例如要获取clients这个数据框中所有变量的唯一值，也就是说每个不同的值只出现一次（利用unique()函数），那么如下所示：

clients <- data.frame(
  hours = c(20, 120, 120, 23),
  public = c(TRUE, TRUE,TRUE, TRUE))
sapply(clients,unique)

运行结果如下所示：

> sapply(clients,unique)
$hours
[1]  20 120  23
$public
[1] TRUE

sapply()中有一个参数simplify，如果设为simplify = FALSE，结果就不会简化，如下所示：

clients <- data.frame(
  hours = c(20, 120, 120, 23),
  public = c(TRUE, TRUE,TRUE, TRUE))
sapply(clients,class,simplify = FALSE)

运行结果如下所示：

> sapply(clients,class,simplify = FALSE)
$hours
[1] "numeric"
$public
[1] "logical"

lapply()

lapply()函数的l表示list（列表），与lapply()函数类似的函数是sapply()，这个s表示simplify（简化）。lapply()返回的结果是列表，而sapply()返回的结果更加简化。

lapply()函数的用法如下所示：

1	lapply(X, FUN, ...)

`lapply()`使用案例1

用法直接看下面案例：

x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
x
lapply(x, mean)
lapply_result <- lapply(x, mean)
str(lapply_result)

运行结果如下所示：

> x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
> x
$a
 [1]  1  2  3  4  5  6  7  8  9 10
$beta
[1]  0.04978707  0.13533528  0.36787944  1.00000000  2.71828183  7.38905610 20.08553692
$logic
[1]  TRUE FALSE FALSE  TRUE
> lapply(x, mean)
$a
[1] 5.5
$beta
[1] 4.535125
$logic
[1] 0.5
> lapply_result <- lapply(x, mean)
> str(lapply_result)
List of 3
 $ a    : num 5.5
 $ beta : num 4.54
 $ logic: num 0.5

从上面的结果我们可以看出来，lapply()函数的作用对象是列表(list)，其返回值也是一个列表。

`lapply()`使用案例2

再看一个案例，注意将lapply()与sapply()进行比较：

> lapply(iris, class)
$Sepal.Length
[1] "numeric"
$Sepal.Width
[1] "numeric"
$Petal.Length
[1] "numeric"
$Petal.Width
[1] "numeric"
$Species
[1] "factor"
> sapply(iris, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
   "numeric"    "numeric"    "numeric"    "numeric"     "factor"

`lapply()`使用案例3

再来看一个案例，先创建一个列表，如下所示：

prime_factors <- list(
  two = 2, 
  three = 3, 
  four = c(3, 2),
  five = 5,
  six = c(2,3),
  seven = 7,
  eight = c(2, 2, 2),
  nine = c(3,3),
  ten = c(2,5)
)
head(prime_factors)

运行结果如下所示：

> head(prime_factors)
$two
[1] 2
$three
[1] 3
$four
[1] 3 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7

现在我们要完成一项任务：以向量化的方式在每个列表元素中搜索唯一值。

常规思路是写一个for循环，如下所示：

unique_primes <- vector("list", length(prime_factors))
for (i in seq_along(prime_factors))
{
  unique_primes[[i]] <- unique(prime_factors[[i]])
}
names(unique_primes) <- names(prime_factors)
unique_primes

运行结果如下所示：

> unique_primes
$two
[1] 2
$three
[1] 3
$four
[1] 3 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7
$eight
[1] 2
$nine
[1] 3
$ten
[1] 2 5

使用lapply()函数就能达到上述目的，并且代码更为简单：

> lapply(prime_factors, unique)
$two
[1] 2
$three
[1] 3
$four
[1] 3 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7
$eight
[1] 2
$nine
[1] 3
$ten
[1] 2 5

`lapply()`使用案例4

在所有的案例中，lapply，vapply和sapply中的函数都是只有一个参数，例如unique()，mean()和length()等等，这些函数的特点就是只能传入一个向量化的参数。如果我们传入的函数有2个参数，怎么办？

此时，要分几种情况。

第一种，可以继续传入其他的标题参数，在这种方式下，只需要把函数名给lapply，然后再传入标题参数，例如我们使用传入的函数是rep，它的使用需要2个以参数，如下所示：

1 2	> rep(c(1,2,3,4),times=5) [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

而参数times只允许传入单个数值（也就是标量）。此时在向lapply()函数传入rep()函数时，可以直接在后面添加上times的参数，如下所示：

> complemented <- c(2,3,6,18)
> lapply(complemented, rep, times=4)
[[1]]
[1] 2 2 2 2
[[2]]
[1] 3 3 3 3
[[3]]
[1] 6 6 6 6
[[4]]
[1] 18 18 18 18

不过，如果我们传入的函数中，第一个参数不是向量，是标量怎么办？

此时就可以自定义一个函数来封装一下我们的目标函数，如下所示：

1
2
3

complemented <- c(2,3,6,18)
rep4x <- function(x) rep(4, times=x)
lapply(complemented, rep4x)

运行结果如下所示：

> complemented <- c(2,3,6,18)
> rep4x <- function(x) rep(4, times=x)
> lapply(complemented, rep4x)
[[1]]
[1] 4 4
[[2]]
[1] 4 4 4
[[3]]
[1] 4 4 4 4 4 4
[[4]]
 [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

其实我们可以使用匿名函数来继续简上面的代码，上面的代码也就等于下面的代码：

1 2	complemented <- c(2,3,6,18) lapply(complemented, function(x) rep(4, times=x))

vapply()`函数

vapply()函数的含义是：应用于列表而返回向量(vector)，它的输入参数除了与lapply()部分相同外，也就说都要输入一个列表和函数，但vapply()还要输入第三个参数，即返回值的模板。

还以lapply()中的数据为例说明一下，如下所示：

prime_factors <- list(
  two = 2, 
  three = 3, 
  four = c(3, 2),
  five = 5,
  six = c(2,3),
  seven = 7,
  eight = c(2, 2, 2),
  nine = c(3,3),
  ten = c(2,5)
)
head(prime_factors)
vapply(prime_factors, length, numeric(1))

运行结果如下所示：

> prime_factors <- list(
+   two = 2, 
+   three = 3, 
+   four = c(3, 2),
+   five = 5,
+   six = c(2,3),
+   seven = 7,
+   eight = c(2, 2, 2),
+   nine = c(3,3),
+   ten = c(2,5)
+ )
> head(prime_factors)
$two
[1] 2
$three
[1] 3
$four
[1] 3 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7
> vapply(prime_factors, length, numeric(1))
  two three  four  five   six seven eight  nine   ten 
    1     1     2     1     2     1     3     2     2

如果数据的输出不能匹配模板，那么vapply()就会抛出一个错误。在这个地方，我们还可以看一下lapply()和sapply()的输出结果，如下所示：

> lapply(prime_factors, length)
$two
[1] 1
$three
[1] 1
$four
[1] 2
$five
[1] 1
$six
[1] 2
$seven
[1] 1
$eight
[1] 3
$nine
[1] 2
$ten
[1] 2
> sapply(prime_factors, length)
  two three  four  five   six seven eight  nine   ten 
    1     1     2     1     2     1     3     2     2

tapply()

tapply()函数经常用于处理因子类型的数据，这个t指的就是table，因子类型的数据经常与列联表有关。它执行的操作是，（暂）将数据进行分级，然后每组对应一个因子水平（或在多重因子的情况下对应一组因子水平的组合），得到数据的列联表，然后这些子向量应用函数g()，先看一个简单的案例，如下所示：

ages <- c(25,26,55,37,21,42)
affils <- c("R","D","D","R","U","D")
tapply(ages,affils,mean)
# D  R  U 
# 41 31 21

再看一个案例：

1
2
3

> tapply(iris$Sepal.Length, iris$Species, mean)
    setosa versicolor  virginica 
     5.006      5.936      6.588

在这个案例中，完成的任务是：告诉R提取Sepal.Length列的数据，然后根据Spacies将基分开，最后计算切分后每组数据的平均值。这是R代码的一个经典过程：切分(Split)，操作(Apply)，组合(Combine)，简称为SAC。

使用`tapply()`创建高维表格

高维表格案例1

以数据集mtcars为例，说明一下，先看一下这个数据集，如下所示：

> str(mtcars)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

其中变量am表示发动机，其中0表示自动档，1表示自动档。由于看起来并不直观，我们可以创建一个名为cars的新对象，它是mtcars的拷贝，但将am数据列修改成了因子类型，如下所示：

1
2
3

cars <- within(mtcars, 
               am <- factor(am, levels = 0:1, labels = c("Automatic", "Manual")))
str(cars)

计算结果如下所示：

> str(cars)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

现在使用tapply()来获取自动档和手动档每加仑汽油的平均行驶英里数(mpg)，如下所示：

1
2
3

> with(cars, tapply(mpg, am, mean))
Automatic    Manual 
 17.14737  24.39231

这是一个一维表格，如果将换档类型(am)与挡位(gear)结合起来，就是下面的这个样子：

> with(cars, tapply(mpg, list(gear, am), mean))
  Automatic Manual
3  16.10667     NA
4  21.05000 26.275
5        NA 21.380

tapply()在创建表格型数据汇总方面与table()函数有点类似，但是table()只对对目标数据进行计数操作，而tapply()则可以指定任意函数作为操作函数。

高维表格案例2

再来看一个高维表格案例。

假定有一个经济数据集，其中包含性别、年龄和收入变量，在这里，如果我们调用tapply(x,f,g)，其中x表示收入，f是一对因子：一个因子性别，另一个因子为此人年龄是否大于25的编码，我们感兴趣的是找出按性别和年龄划分的人群平均收入，如果我们设置g()为mean()，tapply将返回这四个子组每一组的平均收入，这四组分别为：①25以下的男性；②25以上的男性；③25以下的女性；④25以上的女性，代码如下所示：

d <- data.frame(list(gender=c("M","M","F","M","F","F"),
                     age=c(47,59,21,32,33,24),
                     income=c(55000,88000,32450,76500,123000,45650)))
d$over25 <- over25 <- ifelse(d$age >25,1,0)
d
# gender age income over25
# 1      M  47  55000      1
# 2      M  59  88000      1
# 3      F  21  32450      0
# 4      M  32  76500      1
# 5      F  33 123000      1
# 6      F  24  45650      0
tapply(d$income,list(d$gender,d$over25),mean)
# 0         1
# F 39050 123000.00
# M    NA  73166.67

split()

tapply()函数是将向量分割为组，然后针对每个组应用指定的函数，而split函数则是只进行分组，不进行计算，它的基本形式为split(x,f)，其中x是数据集，f为因子列表，这个函数可以将x划分为组，并返回分组的列表，这里需要注意的是，split的作用对象可以是向量，可以是数据框，而在tapply中只能是向量，使用方法如下所示：

> d
  gender age income over25
1      M  47  55000      1
2      M  59  88000      1
3      F  21  32450      0
4      M  32  76500      1
5      F  33 123000      1
6      F  24  45650      0
> split(d$income,list(d$gender,d$over25))
$F.0
[1] 32450 45650
$M.0
numeric(0)
$F.1
[1] 123000
$M.1
[1] 55000 88000 76500

by()

tapply()函数的第1个参数必须是向量，对于矩阵或数据框，就不太适合了，因此可以使用by()函数，这个函数与tapply()的动作方式类似，但是by()可以应用于矩阵或数据框，第一个参数是指定数据集，第二个是分组因子，第三个是要应用的函数，如下所示：

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa
by(iris,iris$Species,function(m) lm(m[,1]~m[,2]))
# iris$Species: setosa
# Call:
#   lm(formula = m[, 1] ~ m[, 2])
# Coefficients:
#   (Intercept)       m[, 2]  
# 2.6390       0.6905  
# --------------------------------------------------------------------------------- 
#   iris$Species: versicolor
# Call:
#   lm(formula = m[, 1] ~ m[, 2])
# Coefficients:
#   (Intercept)       m[, 2]  
# 3.5397       0.8651  
# --------------------------------------------------------------------------------- 
#   iris$Species: virginica
# Call:
#   lm(formula = m[, 1] ~ m[, 2])
# Coefficients:
#   (Intercept)       m[, 2]  
# 3.9068       0.9015

aggregate()

aggregate()函数可以对分组中的每一个变量使用tapply()函数，例如在R的CO2数据集中，按Treatment进行分级，并对每组数据找出其中位数，如下所示：

> head(CO2)
  Plant   Type  Treatment conc uptake
1   Qn1 Quebec nonchilled   95   16.0
2   Qn1 Quebec nonchilled  175   30.4
3   Qn1 Quebec nonchilled  250   34.8
4   Qn1 Quebec nonchilled  350   37.2
5   Qn1 Quebec nonchilled  500   35.3
6   Qn1 Quebec nonchilled  675   39.2
> aggregate(CO2[,c(4,5)],list(CO2$Treatment),median)
     Group.1 conc uptake
1 nonchilled  350   31.3
2    chilled  350   19.7

参考资料

《R语言轻松入门与提高》（Andrie de Vires, Joris Meys著，麦秆创智译）
学习R.[美] Richard，Cotton 著刘军译

前言

apply系列函数的构成

apply()

传入更多参数

sapply()

sapply()案例1

sapply()案例2

switch()函数

将函数应用于数据框

sapply()结果的简化

lapply()

lapply()使用案例1

lapply()使用案例2

lapply()使用案例3

lapply()使用案例4

vapply()`函数

tapply()

使用tapply()创建高维表格

高维表格案例1

高维表格案例2

split()

by()

aggregate()

参考资料

`switch()`函数

`sapply()`结果的简化

`lapply()`使用案例1

`lapply()`使用案例2

`lapply()`使用案例3

`lapply()`使用案例4

使用`tapply()`创建高维表格